6 Comments

I tend to think that this is simply a hardware and software architecture and dataset classification problem. As dataset selection methodologies evolve, couldn't a large model simply be utilized and trained simultaneously? If NNs are coarsely modeled after mammalian brains, couldn't they also be adapted to walk and chew gum at the same time?

Expand full comment

I mostly agree that currently there are hardware and software limitations requiring solutions like embeddings and vector storage for extending the LLMs knowledge base. I also don't think what we have today is going to be the final architecture to achieve something that truly mimics the human brain. The priority right now seems to be LLMs that can master understanding language and interpreting directive (to predict the next set of characters). Where the LLMs "memories" or "brain" sit is probably going to keep changing and evolving.

Expand full comment

And queries destined for an LLM could directly inform training priorities (hopefully this is already happening)

Expand full comment

This is missing something, which is that it can be more feasible to continuously update (train) a smaller model with new data while simultaneously being able to run predictions on it atomically between training micro batches. It is not a pie in the sky to do so. This doesn't make sense to do with GPT but it makes sense with a smaller model. This is the simplest such purely neural approach.

Expand full comment

Agreed, training a smaller model is also an option depending on the use case. Seems like LLMs are also going to get smaller in size over time without sacrificing ability to interpret. However, processing and cost would be something to watch out for with this approach.

Expand full comment

The small LLM (which is updated continuously) can also handoff its notes to a large LLM (which is updated rarely) for a final review before a response is given to the user. Yes, cost is an issue for continuous updates to a small LLM.

Expand full comment