On the occasion of the first anniversary of DeepSeek-R1, a new model "MODEL1" is revealed

January 21st, according to QuantumBit, DeepSeek-R1 has exposed its new model "MODEL1" on the occasion of its first anniversary. DeepSeek has updated the FlashMLA code on GitHub, with 28 mentions of MODEL1 across 114 files, appearing as a distinct model from V32. V32 is known to be DeepSeek-V3.2, so MODEL1 is likely a new architecture. Specific differences in the code are reflected in KV cache layout, sparsity handling, and FP8 decoding, showing several differences in memory optimization.
coin rss
Global-Lives