How is model size measured for Large Language Models?
AI Scaling ChatGPT4 gives a far better answer than Gemini/Bard, although Chat's outdated training cutoff in 'early 2023' is a handicap. Hoffman et al. found " that for compute-optimal training, the model size and the number of training tokens should be scaled equally: for every doubling of model size the number of training tokens should also be doubled. " One of the authors, Arthur Mensch, has founded a company, Mistral , based on making the most efficient AI components and advancing Europe in the AI arms race (an expression I don't care for because it conflicts with safety). ChatGPT4 Model size for Large Language Models (LLMs), including GPT models, is typically measured by the number of parameters they have. These parameters are the internal settings learned from data during the training process, and they dictate how the model responds to input queries. A parameter in this context is a component of the model's architecture that can adjust and learn fr