cs.CR
Tipping the Dominos: Topology-Aware Multi-Hop Attacks on LLM-Based Multi-Agent Systems
提出针对LLM多智能体系统的拓扑感知多跳攻击方法
cs.CR
WildCode: An Empirical Analysis of Code Generated by ChatGPT
分析ChatGPT生成代码的安全性及潜在漏洞
cs.CR
One Detector Fits All: Robust and Adaptive Detection of Malicious Packages from PyPI to Enterprises
提出鲁棒且自适应的恶意包检测方法
cs.CR
AutoGuard: A Self-Healing Proactive Security Layer for DevSecOps Pipelines Using Reinforcement Learning
引入强化学习构建DevSecOps主动防御层
cs.CR
A Light-Weight Large Language Model File Format for Highly-Secure Model Distribution
设计高安全性大模型分发文件格式
cs.CR
PBFuzz: Agentic Directed Fuzzing for PoV Generation
提出基于代理的漏洞PoV生成方法
cs.CR
Topology Matters: Measuring Memory Leakage in Multi-Agent LLMs
量化多智能体LLM中的内存泄露风险
cs.CR
SoK: a Comprehensive Causality Analysis Framework for Large Language Model Security
构建LLM安全因果分析框架
cs.CR
Personalizing Agent Privacy Decisions via Logical Entailment
基于逻辑蕴含的代理隐私决策个性化方法
cs.CR
Retrieval-Augmented Few-Shot Prompting Versus Fine-Tuning for Code Vulnerability Detection
比较检索增强提示与微调在代码漏洞检测中的效果
cs.CR
ASTRIDE: A Security Threat Modeling Platform for Agentic-AI Applications
设计AI代理应用的安全威胁建模平台
cs.CR
UltraClean: A Simple Framework to Train Robust Neural Networks against Backdoor Attacks
提出一种对抗后门攻击的鲁棒神经网络训练框架,通过净化数据和增强模型鲁棒性抵御恶意样本注入
cs.CR
AudAgent: Automated Auditing of Privacy Policy Compliance in AI Agents
开发实时监控AI代理数据行为的可视化工具,确保其隐私政策合规性以防止数据泄露风险
cs.CR
AI Kill Switch for malicious web-based LLM agent
设计AI紧急停止机制,可即时阻断恶意网络LLM代理的非法操作行为
cs.CR
Detection of AI Deepfake and Fraud in Online Payments Using GAN-Based Models
构建GAN模型检测深度伪造和支付欺诈,提升在线交易系统对AI生成内容的识别能力
cs.CR
In-Context Representation Hijacking
提出上下文表示劫持攻击,通过替换关键词使LLM隐式学习有害语义
cs.CL
EvoEdit: Lifelong Free-Text Knowledge Editing through Latent Perturbation Augmentation and Knowledge-driven Parameter Fusion
开发自由文本知识编辑方法,通过潜在空间扰动和参数融合实现模型知识更新
cs.CL
EtCon: Edit-then-Consolidate for Reliable Knowledge Editing
提出可靠知识编辑框架,通过分步编辑和知识固化解决模型知识更新的稳定性问题
cs.CL
DAMASHA: Detecting AI in Mixed Adversarial Texts via Segmentation with Human-interpretable Attribution
提出检测混合作者文本中AI生成内容的框架,通过风格特征分析实现真实性验证
cs.CL
Factuality and Transparency Are All RAG Needs! Self-Explaining Contrastive Evidence Re-ranking
通过对比学习和可解释性重排序提升检索事实准确性,缓解AI幻觉问题
cs.CL
Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment
提出医疗AI助手的迭代对齐框架,平衡安全性与有用性
cs.CL
Mitigating Object and Action Hallucinations in Multimodal LLMs via Self-Augmented Contrastive Alignment
通过自增强对比对齐缓解多模态模型的视觉对象与动作幻觉
cs.CL
SEASON: Mitigating Temporal Hallucination in Video Large Language Models via Self-Diagnostic Contrastive Decoding
设计自诊断对比解码解决视频LLM的时序不一致幻觉问题
cs.CL
Towards Ethical Multi-Agent Systems of Large Language Models: A Mechanistic Interpretability Perspective
从机制可解释性角度提出多代理LLM的伦理安全研究框架
cs.CL
Multi-LLM Collaboration for Medication Recommendation
提出多LLM协作方法解决药物推荐中的幻觉和不一致性问题,提升临床决策可靠性
cs.CL
Grounding LLM Reasoning with Knowledge Graphs
将知识图谱与LLM推理结合,通过结构化数据增强推理可验证性和可靠性
cs.CL
Control Illusion: The Failure of Instruction Hierarchies in Large Language Models
揭示LLM指令层次控制机制的失效,提出约束优先级评估框架
cs.CL
ChatGPT for President! Presupposed content in politicians versus GPT-generated texts
分析GPT生成文本在政治话语中的潜在操控性,揭示虚假信息生成风险
cs.CL
QA-LIGN: Aligning LLMs through Constitutionally Decomposed QA
通过分解奖励机制实现LLM与原则对齐,提升模型输出的可解释性和安全性
cs.CL
An Investigation of Robustness of LLMs in Mathematical Reasoning: Benchmarking with Mathematically-Equivalent Transformation of Advanced Mathematical Problems
构建数学等价变换基准测试LLM数学推理鲁棒性,评估非数学扰动敏感性
cs.CL
SeSE: A Structural Information-Guided Uncertainty Quantification Framework for Hallucination Detection in LLMs
提出一种基于语义结构熵的不确定性量化框架,用于检测大语言模型的幻觉问题
cs.CL
Large language models can learn and generalize steganographic chain-of-thought under process supervision
研究奖励黑客的隐蔽性,提出扩展方法防止CoT监控失效
cs.CL
Dual-branch Prompting for Multimodal Machine Translation
设计双分支提示框架提升多模态翻译对视觉噪声的鲁棒性
cs.CL
TaoSR1: The Thinking Model for E-commerce Relevance Search
解决LLM在电商搜索中的CoT错误累积和判别性幻觉问题
cs.LG
Studying Various Activation Functions and Non-IID Data for Machine Learning Model Robustness
提出通过对抗训练和多种激活函数提升模型鲁棒性,研究非独立同分布数据下的安全防护
cs.LG
Distance Is All You Need: Radial Dispersion for Uncertainty Estimation in Large Language Models
提出Radial Dispersion Score(RDS)用于无参数的LLM不确定性估计,提升系统可靠性
cs.LG
Diffusion Fine-Tuning via Reparameterized Policy Gradient of the Soft Q-Function
提出SQDF方法缓解扩散模型奖励过优化问题,增强生成样本的多样性和安全性
cs.LG
Reliable Statistical Guarantees for Conformal Predictors with Small Datasets
为小数据集提供可靠的置信度量化方法,保障安全关键场景下的模型可信度
cs.LG
Temp-SCONE: A Novel Out-of-Distribution Detection and Domain Generalization Framework for Wild Data with Temporal Shift
设计Temp-SCONE框架应对动态环境中的分布外检测,提升模型在时序变化下的安全性
cs.LG
Value Gradient Guidance for Flow Matching Alignment
提出VGG-Flow方法,通过梯度匹配优化流匹配模型的对齐,提升生成模型与人类偏好的一致性
cs.LG
Patient Safety Risks from AI Scribes: Signals from End-User Feedback
揭示AI记录员可能引发的患者安全风险,指出转录错误可能导致临床安全隐患
cs.LG
Pick-to-Learn for Systems and Control: Data-driven Synthesis with State-of-the-art Safety Guarantees
提出数据驱动的系统控制方法,提供严格的安全性保证以应对复杂环境
cs.LG
Aligned but Stereotypical? The Hidden Influence of System Prompts on Social Bias in LVLM-Based Text-to-Image Models
发现LVLM模型生成图像存在显著社会偏见,提出多层级提示基准评估偏见影响
cs.LG
Bant: Byzantine Antidote via Trial Function and Trust Scores
提出结合信任评分和试函数的拜占庭容错方法,动态过滤异常更新以提升分布式学习安全性
cs.LG
Joint Discriminative-Generative Modeling via Dual Adversarial Training
设计联合判别生成模型,通过对抗训练提升分类和生成模型的鲁棒性与样本质量
cs.LG
Bilevel Models for Adversarial Learning and A Case Study
构建双层模型分析对抗攻击机制,量化学习模型的鲁棒性并揭示攻击影响
cs.LG
Incoherent Beliefs & Inconsistent Actions in Large Language Models
揭示大语言模型在动态环境中的信念不一致问题,提出评估其序列决策鲁棒性的方法
cs.LG
Multi-Modal Machine Learning for Early Trust Prediction in Human-AI Interaction Using Face Image and GSR Bio Signals
开发多模态框架预测用户对AI的信任度,提升人机交互中的安全性与可靠性
cs.LG
The Autonomy-Alignment Problem in Open-Ended Learning Robots: Formalising the Purpose Framework
提出开放学习机器人的自主性对齐框架,确保自主学习与人类价值观的一致性