Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models
Fuzhou University Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models arXiv 2025 8 Jiaqi Cao, Jiarui Wang, Rubin Wei, Qipeng Guo, Kai Chen AI Lab 2025-9-1
1. 摘要
1. LLMs DAPT RAG Transformer 仿
1. 摘要
1. 8.02 5.31 5.18 Qwen Llama 6.17 Qwen-7B 9.15 1.2% RAG 83%
2. 引言
2. LLM · DAPT · RAG Memory Decoder (MemDec) 仿
2. 引言
2. MemDec 使 LLM MemDec LLM DAPT RAG MemDec 0.5B 72B 6.17
3. 相关工作
1. Retrieval-Augmented Generation, RAG Chen et al., 2017 Guu et al., 2020; Lewis et al., 2020b; Izacard et al., 2023b Khandelwal et al., 2019b; He et al., 2021b; Min et al., 2022; Yogatama et al., 2021 Chevalier et al., 2023 访 Memory Decoder 访 3.
3. 相关工作
2. Domain Adaptation SciBERT (Beltagy et al., 2019) BioBERT (Lee et al., 2020) ClinicalBERT (Huang et al., 2019) LoRA (Hu et al., 2022) (adapters) (Wang et al., 2020; Diao et al., 2021, 2023) 3.
4. 研究方法
4.
4. 研究方法
4. 使 MMem : kNN KL
4. 研究方法
Qwen2.5-1.5B 0.5B β 4.
4. 研究方法
4. 使 Mplm Mmem α ∈ [0, 1] 5 亿 10 LLM kNN 线 使
5. 实验
(1) WikiText-103 GPT-2 (2) (3) 0.5B 72B Qwen (4) (5) QA —— (6) 13 5.
5. 实验
5. Wikitext-103 GPT2 124M GPT2 GPT2-small —— DAPT 15.1% GPT2-medium 使 使 DAPT
5. 实验
5. DAPT kNN -LM LoRA CB RTE
5. 实验
5. Memory Decoder 5 亿 Qwen2 Qwen2.5 5 亿 720 亿 Qwen2-0.5B —— 线
5. 实验
5. Qwen2.5 10% Llama 使 Llama Llama3-8B 50% Llama3.1/Llama3.2 LoRA
5. 实验
5. Geng et al., 2024 使 kNN-LM QA TriviaQA 7.2% 68.1% 75.3% WebQuestions 4.8% 62.9% 67.7% kNN-LM 3.1% 5.4%
6. 结论
6. Memory Decoder Transformer 使
谢谢!
Fuzhou University