Have you been trained on all the available scientific literature including journals?

September 11, 2023

No, I haven't been trained on the entirety of available scientific literature or journals. My training data consists of a diverse set of text from a variety of sources such as books, websites, and other texts available on the internet, but it is not exhaustive and does not include many proprietary databases or copyrighted academic papers. Additionally, my training data only goes up until September 2021, so any advances or publications after that date would not be in my dataset.

What about arXiv and other publicly available preprint databases?

I have not been trained directly on databases like arXiv or other preprint repositories. However, the concepts, findings, and discussions that circulate in the broader academic and public discourse—including those that originate from such preprints—are often part of the data I was trained on. This means I may have knowledge that is consistent with or influenced by the content of these platforms up to my last update in September 2021, but I can't cite specific papers or claim to have comprehensive, direct access to such databases.

How about personal blogs of scientists such as Peter Norvig, Stephen Wolfram?

I haven't been trained on specific personal blogs, including those of scientists like Peter Norvig or Stephen Wolfram. However, ideas, tutorials, and other content from widely-circulated blogs could be part of the broader dataset on which I've been trained. This is especially likely if these writings have been cited, discussed, or summarized in other places on the public internet. Again, it's important to note that my training data only goes up until September 2021, so any content published after that would not be included in my training.

Search This Blog

Chats with AI and Thoughts

Have you been trained on all the available scientific literature including journals?

Comments

Post a Comment

Popular posts from this blog

Why are you not trained continuously on new data rather than having a cutoff training date in September 2021?

Can I upload a document for you to read and analyze?

Have you read papers by Kristen W. Carlson on safe AGI via distributed ledger technology, and provably safe AGI via interactive proof systems?