6 Comments

I tend to think that this is simply a hardware and software architecture and dataset classification problem. As dataset selection methodologies evolve, couldn't a large model simply be utilized and trained simultaneously? If NNs are coarsely modeled after mammalian brains, couldn't they also be adapted to walk and chew gum at the same time?

Expand full comment

This is missing something, which is that it can be more feasible to continuously update (train) a smaller model with new data while simultaneously being able to run predictions on it atomically between training micro batches. It is not a pie in the sky to do so. This doesn't make sense to do with GPT but it makes sense with a smaller model. This is the simplest such purely neural approach.

Expand full comment