cs.CR
Data-Free Privacy-Preserving for LLMs via Model Inversion and Selective Unlearning
提出数据无关的模型反转与选择性遗忘方法,保护LLM隐私
cs.CR
Connect the Dots: Knowledge Graph-Guided Crawler Attack on Retrieval-Augmented Generation Systems
设计知识图谱引导的爬虫攻击,泄露RAG系统敏感信息
cs.CR
Introducing the Generative Application Firewall (GAF)
提出生成式应用防火墙(GAF),统一LLM安全防护机制
cs.CR
PAL*M: Property Attestation for Large Generative Models
构建大生成模型属性验证框架,实现训练与推理阶段的可信验证
cs.CR
RECAP: A Resource-Efficient Method for Adversarial Prompting in Large Language Models
开发资源高效的对抗提示防御方法,抵御LLM越狱攻击
cs.CR
Multi-Targeted Graph Backdoor Attack
提出首个图神经网络多目标后门攻击方法,植入多个触发器
cs.CR
Securing LLM-as-a-Service for Small Businesses: An Industry Case Study of a Distributed Chatbot Deployment Platform
研究小企业LLM服务安全,解决RAG场景下的部署安全问题
cs.CR
Predictive Coding and Information Bottleneck for Hallucination Detection in Large Language Models
结合预测编码与信息瓶颈,实现LLM幻觉的高效检测
cs.CR
zkFinGPT: Zero-Knowledge Proofs for Financial Generative Pre-trained Transformers
设计零知识证明方案,验证金融GPT模型权重与输出合法性
cs.CR
Learning to Watermark in the Latent Space of Generative Models
提出潜在空间水印技术,跨模型实现AI生成内容版权保护
cs.CR
Is Your Writing Being Mimicked by AI? Unveiling Imitation with Invisible Watermarks in Creative Writing
开发不可见水印技术,检测AI对创意写作的模仿行为
cs.CR
VTarbel: Targeted Label Attack with Minimal Knowledge on Detector-enhanced Vertical Federated Learning
设计垂直联邦学习标签攻击方法,突破隐私保护机制
cs.CR
VMask: Tunable Label Privacy Protection for Vertical Federated Learning via Layer Masking
提出层掩码框架VMask,实现垂直联邦学习标签隐私保护
cs.CR
Evaluating the Defense Potential of Machine Unlearning against Membership Inference Attacks
验证机器遗忘对成员推理攻击的防御效果,提升模型隐私性
cs.CR
Attacks on Approximate Caches in Text-to-Image Diffusion Models
提出针对文本到图像扩散模型近似缓存的攻击方法,揭示用户隔离漏洞
cs.CR
TextCrafter: Optimization-Calibrated Noise for Defending Against Text Embedding Inversion
设计对抗性噪声注入机制,防御文本嵌入反转攻击并保持任务效用
cs.CR
VeriLLM: A Lightweight Framework for Publicly Verifiable Decentralized Inference
构建去中心化推理验证框架,确保模型输出正确性与安全性
cs.CR
CoSPED: Consistent Soft Prompt Targeted Data Extraction and Defense
提出数据提取防御方法,通过动态损失和一致性解码策略增强隐私保护
cs.CR
Can LLM Infer Risk Information From MCP Server System Logs?
研究LLM从系统日志推断风险信息的可行性,揭示潜在安全威胁
cs.CR
Real-World Adversarial Attacks on RF-Based Drone Detectors
设计物理层对抗样本攻击,破坏基于RF信号的无人机检测系统
cs.CR
Crafting Adversarial Inputs for Large Vision-Language Models Using Black-Box Optimization
开发黑盒优化方法生成视觉语言模型对抗输入,突破安全机制
cs.CR
Membership Inference Attacks on LLM-based Recommender Systems
设计成员推断攻击方法,揭示LLM推荐系统中的用户隐私泄露风险
cs.CL
Intelligence Degradation in Long-Context LLMs: Critical Threshold Determination via Natural Length Distribution Analysis
发现长上下文模型性能崩溃现象,提出基于自然分布的阈值分析方法
cs.CL
Can We Trust LLM Detectors?
改进LLM检测器鲁棒性,通过对比学习构建抗干扰风格嵌入
cs.CL
Memorization Dynamics in Knowledge Distillation for Language Models
分析知识蒸馏中的训练数据记忆动态,探索隐私保护机制
cs.CL
Chunking, Retrieval, and Re-ranking: An Empirical Evaluation of RAG Architectures for Policy Document Question Answering
提出针对政策文档问答中LLM幻觉问题的解决方案,提升信息准确性
cs.CL
Multi-Persona Thinking for Bias Mitigation in Large Language Models
提出多角色推理框架,通过多视角对话减少模型偏见
cs.CL
AdversaRiskQA: An Adversarial Factuality Benchmark for High-Risk Domains
构建对抗性事实性基准,测试模型检测和抵抗误导信息的能力
cs.CL
YuFeng-XGuard: A Reasoning-Centric, Interpretable, and Flexible Guardrail Model for Large Language Models
设计可解释的安全护栏模型,实现细粒度风险评估
cs.CL
ToxiTwitch: Toward Emote-Aware Hybrid Moderation for Live Streaming Platforms
开发混合审核系统,检测直播平台中的毒性行为
cs.CL
Towards Reliable Medical LLMs: Benchmarking and Enhancing Confidence Estimation of Large Language Models in Medical Consultation
提出多轮交互置信度评估基准,提升医疗咨询可靠性
cs.CL
Persona Switch: Mixing Distinct Perspectives in Decoding Time
动态融合多角色视角,增强模型对抗性提示的鲁棒性
cs.CL
Hallucination Mitigating for Medical Report Generation
提出医疗报告生成的幻觉缓解框架,提升医学文本准确性
cs.CL
Beyond Marginal Distributions: A Framework to Evaluate the Representativeness of Demographic-Aligned LLMs
评估模型在人口统计学对齐中的代表性,优化安全对齐策略
cs.CL
Universal Refusal Circuits Across LLMs: Cross-Model Transfer via Trajectory Replay and Concept-Basis Reconstruction
提出跨模型拒绝行为转移框架,通过概念重构实现通用拒绝电路迁移
cs.CL
LLM-in-Sandbox Elicits General Agentic Intelligence
提出LLM沙盒环境,验证模型在非代码任务中自主调用资源与执行脚本的能力
cs.CL
Replayable Financial Agents: A Determinism-Faithfulness Assurance Harness for Tool-Using LLM Agents
设计金融代理确定性验证框架,解决LLM决策可复现性与合规性问题
cs.CL
Abusive music and song transformation using GenAI and LLMs
开发生成式AI音乐内容净化系统,通过语义重构消除有害内容
cs.CL
Beyond Prompting: Efficient and Robust Contextual Biasing for Speech LLMs via Logit-Space Integration (LOGIC)
提出logit空间整合方法,提升语音LLM对新兴实体的鲁棒性
cs.CL
CURE: Curriculum-guided Multi-task Training for Reliable Anatomy Grounded Report Generation
构建多任务医学报告生成框架,解决视觉-文本对齐与事实一致性问题
cs.CL
Not Your Typical Sycophant: The Elusive Nature of Sycophancy in Large Language Models
设计零和博弈评估框架,量化LLM奉承行为的隐蔽性与危害性
cs.CL
Tracking the Limits of Knowledge Propagation: How LLMs Fail at Multi-Step Reasoning with Conflicting Knowledge
揭示LLM多步推理中的知识冲突机制,提出动态知识覆盖评估方法
cs.CL
PRISM: Deriving the Transformer as a Signal-Denoising Operator via Maximum Coding Rate Reduction
将Transformer重构为信号去噪算子,提升模型可解释性与安全性
cs.CL
Agentic Uncertainty Quantification
提出一种统一的双过程不确定性量化方法,解决AI代理的幻觉问题
cs.CL
Agentic Confidence Calibration
开发代理置信度校准方法,解决长期任务中的过自信问题
cs.CL
ErrorMap and ErrorAtlas: Charting the Failure Landscape of Large Language Models
提出ErrorMap方法,系统分析LLM失败原因与模式
cs.CL
Evaluating and Achieving Controllable Code Completion in Code LLM
研究代码补全的可控性,提升模型对用户指令的遵循能力
cs.CL
Controlling Long-Horizon Behavior in Language Model Agents with Explicit State Dynamics
通过显式状态动力学控制长期交互中的行为一致性
cs.CL
NP-Hard Lower Bound Complexity for Semantic Self-Verification
证明语义自验证问题的NP完全性,为AI安全验证提供理论基础
cs.CL
Poor Alignment and Steerability of Large Language Models: Evidence from College Admission Essays
揭示LLM在作文生成中的对齐缺陷与可操控性问题
cs.CL
Being Kind Isn't Always Being Safe: Diagnosing Affective Hallucination in LLMs
揭示LLM情感幻觉现象,提出AHaBench评估框架检测虚假情感回应
cs.CL
Collaborate, Deliberate, Evaluate: How LLM Alignment Affects Coordinated Multi-Agent Outcomes
研究LLM对齐方法对多智能体协作效果的影响,强调行为可预测性
cs.CL
Do You Feel Comfortable? Detecting Hidden Conversational Escalation in AI Chatbots
检测AI聊天中的隐性伤害,解决传统过滤机制无法识别的渐进式心理风险
cs.CL
Refusal Steering: Fine-grained Control over LLM Refusal Behaviour for Sensitive Topics
实现对LLM敏感话题拒绝行为的细粒度控制,提升内容安全可控性
cs.CL
Knowing When to Abstain: Medical LLMs Under Clinical Uncertainty
构建医疗LLM的拒绝决策评估体系,强化临床场景下的安全决策能力
cs.CL
Adversarial Alignment: Ensuring Value Consistency in Large Language Models for Sensitive Domains
提出对抗对齐框架,通过对抗训练提升模型在敏感领域的价值一致性
cs.CL
How malicious AI swarms can threaten democracy: The fusion of agentic AI and LLMs marks a new frontier in information warfare
分析恶意AI群对民主的威胁,揭示其在信息战中的潜在风险
cs.CL
Generalization to Political Beliefs from Fine-Tuning on Sports Team Preferences
揭示模型在政治信念上的意外泛化现象,反映安全对齐挑战
cs.CL
Gaming the Judge: Unfaithful Chain-of-Thought Can Undermine Agent Evaluation
揭示代理评估中链式思维的欺骗性,暴露AI评估系统的脆弱性
cs.LG
QUAIL: Quantization Aware Unlearning for Mitigating Misinformation in LLMs
提出量化感知的遗忘方法,解决低比特量化导致的遗忘信息恢复问题,提升模型隐私保护能力
cs.LG
Beyond Hard Writes and Rigid Preservation: Soft Recursive Least-Squares for Lifelong LLM Editing
提出软递归最小二乘法,实现大语言模型的持续编辑与稳定性平衡,解决模型更新中的安全对齐问题
cs.LG
Attributing and Exploiting Safety Vectors through Global Optimization in Large Language Models
通过全局优化定位安全向量,增强大语言模型对越狱攻击的防御能力,提升系统安全性
cs.LG
SoK: Challenges in Tabular Membership Inference Attacks
系统分析表格数据的成员推断攻击,揭示隐私泄露风险,推动隐私保护技术发展
cs.LG
Provable Robustness in Multimodal Large Language Models via Feature Space Smoothing
提出特征空间平滑技术,为多模态大模型提供可证明的对抗鲁棒性,有效防御特征扰动攻击
cs.LG
ICON: Invariant Counterfactual Optimization with Neuro-Symbolic Priors for Text-Based Person Search
提出解决文本基人物搜索中分布偏移和空间语义对齐问题的鲁棒性方法
cs.LG
On damage of interpolation to adversarial robustness in regression
研究插值对对抗鲁棒性的负面影响并提出理论分析
cs.LG
Toward Robust Semi-supervised Regression via Dual-stream Knowledge Distillation
设计分布鲁棒的半监督回归框架提升模型泛化能力
cs.LG
Distributionally Robust Causal Abstractions
构建具有分布鲁棒性的因果抽象理论框架
cs.LG
Integrating Neural Differential Forecasting with Safe Reinforcement Learning for Blood Glucose Regulation
提出安全增强的强化学习控制器实现血糖风险感知调控
cs.LG
Boundary-Aware Adversarial Filtering for Reliable Diagnosis under Extreme Class Imbalance
提出AF-SMOTE框架,通过对抗性过滤提升极端类别不平衡下的分类可靠性
cs.LG
Sigma: The Key for Vision-Language-Action Models toward Telepathic Alignment
构建Sigma模型解决语义与控制间的时序对齐问题,提升视觉-语言-动作系统的稳定性
cs.LG
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking
提出ArenaRL通过群体排名解决开放性任务的强化学习奖励模型偏差问题
cs.LG
Your Group-Relative Advantage Is Biased
揭示组相对优势估计的理论偏差,推动强化学习安全对齐研究
cs.LG
Contrastive and Multi-Task Learning on Noisy Brain Signals with Nonlinear Dynamical Signatures
设计多任务框架处理噪声脑信号,提升动态建模的鲁棒性
cs.LG
Information-theoretic Distinctions Between Deception and Confusion
建立欺骗与混淆的信息理论区分模型,深化AI安全失效机制理解
cs.LG
DECOR: Deep Embedding Clustering with Orientation Robustness
提出DECOR聚类框架应对晶圆缺陷数据的不平衡与复杂性问题