Paper Discussion - LoongRL, StreamingLLM, DuoAttention, QUEST and RetrievalAttention
March 04, 2026
YouTube Streamed
Papers Discussed
- LoongRL: Reinforcement Learning for Advanced Reasoning over Long Contexts
- Efficient Streaming Language Models with Attention Sinks
- DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
- QUEST: Query-Aware Sparsity for Efficient Long-Context LLM Inference
- RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval