Discussion about this post

User's avatar
Smart Christopher's avatar

I’ve been to a lot of these idyllic picnics with beautiful souls who love to talk about pee and poop and cum. But usually at these things there are both very hot people and very ugly people. And when you get to know these people better, it always turns out that most of them—hot and ugly—are completely miserable, despite the fact that they’re also all on anti-depressants (there’s usually one or two hot gay dudes who really do seem to have found happiness). Still probably better than being a generic South Asian techie, although I really have no idea what that’s like.

Expand full comment
Huw Evans's avatar

Given the cited stats here and elsewhere as well as in everyday experience, does anyone else feel that this model isn’t significantly different, at least to justify the full version increment?

The one statistic mentioned in this overview where they observed a 67% drop seems like it could easily be reduced simply by editing 3.7’s system prompt.

What are folks’ theories on the version increment? Is the architecture significantly different (not talking about adding more experts to the MoE or fine tuning on 3.7’s worst failures. I consider those minor increments rather than major).

One way that it could be different is if they varied several core hyperparameters to make this a wider/deeper system but trained it on the same data or initialized inner layers to their exact 3.7 weights. And then this would “kick off” the 4 series by allowing them to continue scaling within the 4 series model architecture.

Expand full comment

No posts