📚 The largest truly open library in human history.
📈 64,416,225 books, 95,689,473 papers — preserved forever.
AA301TB
direct uploads
IA304TB
scraped by AA
DuXiu298TB
scraped by AA
Hathi 9TB
scraped by AA
Libgen.li214TB
collab with AA
Z-Lib 86TB
collab with AA
Libgen.rs 88TB
mirrored by AA
Sci-Hub 94TB
mirrored by AA
🛜 Official domains: FAQ and Wikipedia.
⭐️ Our code and data are 100% open source. Learn more…
‼️ Apologies, we had a glitch affecting membership. We have fixed it and given everyone affected extra downloads for 2 weeks. We got a lot of emails about this and won't be able to answer each one due to the volume. If you still have issues, please send us another email.
It is well understood that LLMs thrive on high-quality data. We have the largest collection of books, papers, magazines, etc in the world, which are some of the highest quality text sources.
Unique scale and range
Our collection contains over a hundred million files, including academic journals, textbooks, and magazines. We achieve this scale by combining large existing repositories.
Some of our source collections are already available in bulk (Sci-Hub, and parts of Libgen). Other sources we liberated ourselves. Datasets shows a full overview.
Our collection includes millions of books, papers, and magazines from before the e-book era. Large parts of this collection have already been OCR’ed, and already have little internal overlap.
How we can help
We’re able to provide high-speed access to our full collections, as well as to unreleased collections.
This is enterprise-level access that we can provide for donations in the range of tens of thousands USD. We’re also willing to trade this for high-quality collections that we don’t have yet.
We can refund you if you’re able to provide us with enrichment of our data, such as:
OCR
Removing overlap (deduplication)
Text and metadata extraction
Support long-term archival of human knowledge, while getting better data for your model!