GitHub - Daniil-Selikhanovych/Shampoo_optimizer: Our implementation of Shampoo optimizer based on https://arxiv.org/pdf/1802.09568.pdf
OPTIMIZER WASH, Stark saures Spezial-Shampoo, 200 kg
Boris Dayma 🖍️ on X: "We ran a grid search on each optimizer to find best learning rate. In addition to training faster, Distributed Shampoo proved to be better on a large