Online Attention Entropy
Attention entropy collapse is a failure mode in ML where attention logits converge to a small number of states. In practice, this is often poorly monitored and patched with normalization. In this note, I discuss how to extend the attention algorithm to compute entropy online with the same GPU kernel.