视频选集

Welch Labs 解析 DeepSeek 如何通过 MLA(Multi-head Latent Attention)技术重构 Transformer 架构,实现 KV 缓存减少 93.3% 和生成速度提升 6 倍以上。
Url: https://www.youtube.com/watch?v=0VLAoVGf_74
Title: How DeepSeek Rewrote the Transformer [MLA]
Uploader: Welch Labs
Uploader ID: @WelchLabsVideo