New
JSONL: 1.1 MB JSON: 1.1 MB PARQUET: 1.4 KB

Caia Math Dataset

A comprehensive caia math dataset for AI/ML research and development. This dataset contains high-quality, curated data designed for training and evaluation of machine learning models.

The dataset is available in multiple formats including JSON, JSONL, and Parquet to support different workflows and tools. All data is carefully validated and documented to ensure reliability and practical utility.

Released under permissive licensing for both research and commercial use.

License: CC0 1.0 Universal
Last Updated: August 21, 2025
Version: 1.0.0
New
JSONL: 7.8 MB JSON: 7.9 MB PARQUET: 7.3 KB

Caia System Design Dataset

A comprehensive caia system design dataset for AI/ML research and development. This dataset contains high-quality, curated data designed for training and evaluation of machine learning models.

The dataset is available in multiple formats including JSON, JSONL, and Parquet to support different workflows and tools. All data is carefully validated and documented to ensure reliability and practical utility.

Released under permissive licensing for both research and commercial use.

License: CC0 1.0 Universal
Last Updated: August 21, 2025
Version: 1.0.0
New
JSONL: 22.5 MB JSON: 22.6 MB PARQUET: 21.7 MB

Cicd Epic Dataset

A comprehensive cicd epic dataset for AI/ML research and development. This dataset contains high-quality, curated data designed for training and evaluation of machine learning models.

The dataset is available in multiple formats including JSON, JSONL, and Parquet to support different workflows and tools. All data is carefully validated and documented to ensure reliability and practical utility.

Released under permissive licensing for both research and commercial use.

License: CC0 1.0 Universal
Last Updated: August 21, 2025
Version: 1.0.0
New
JSONL: 16.4 MB JSON: 16.6 MB PARQUET: 15.2 MB

Go Epic Dataset

A comprehensive Go programming dataset containing examples from basic syntax to distributed systems and algorithms. Each entry includes working code with explanations covering why specific patterns are used and how they apply to real development.

The dataset spans goroutines and channels, HTTP servers, database patterns, Kubernetes operators, gRPC services, and distributed systems architecture. It includes algorithm problems with multiple solution approaches and performance analysis.

Uses a minimal JSONL format with three fields per entry: instruction, output, and topic. Each example combines explanation and code in a single integrated teaching unit.

Released under CC0 1.0 Universal license. Available on GitHub, Hugging Face, and caiatech.com.

License: CC0 1.0 Universal
Last Updated: August 21, 2025
Version: 1.0.0
New
JSONL: 5.7 MB JSON: 5.8 MB PARQUET: 5.6 MB

Python Epic Dataset

A comprehensive python epic dataset for AI/ML research and development. This dataset contains high-quality, curated data designed for training and evaluation of machine learning models.

The dataset is available in multiple formats including JSON, JSONL, and Parquet to support different workflows and tools. All data is carefully validated and documented to ensure reliability and practical utility.

Released under permissive licensing for both research and commercial use.

License: CC0 1.0 Universal
Last Updated: August 21, 2025
Version: 1.0.0
New
JSONL: 1.4 MB JSON: 1.4 MB PARQUET: 876.9 KB

Rust Epic Dataset

A comprehensive rust epic dataset for AI/ML research and development. This dataset contains high-quality, curated data designed for training and evaluation of machine learning models.

The dataset is available in multiple formats including JSON, JSONL, and Parquet to support different workflows and tools. All data is carefully validated and documented to ensure reliability and practical utility.

Released under permissive licensing for both research and commercial use.

License: CC0 1.0 Universal
Last Updated: August 21, 2025
Version: 1.0.0
+

More Datasets Coming Soon

We're preparing additional high-quality datasets for computer vision, NLP, and reinforcement learning tasks.

Open Source

All datasets are released under permissive licenses, making them free to use for research and commercial applications.

High Quality

Each dataset is carefully curated and validated to ensure accuracy, completeness, and practical utility for AI/ML tasks.

Multiple Formats

Datasets are available in JSON, JSONL, and Parquet formats to support different workflows and tools.