Paper of The Day
me (
neuralnets
) explaining one paper per day :)
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
POTD1
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax
POTD2
The Leaderboard Illusion
POTD3