from datasets import load_dataset_builder
x = load_dataset_builder("openslr/librispeech_asr")1 HF Datasets
Hugging Face Datasets is a python library that allows for easy access to the Hugging Face Hub.
1.1 Installing
pip install datasets huggingface_hub[hf_xet]
1.3 Downloading a dataset
A Dataset object isn’t really a dataset. It’s a split of a dataset or a split of a subset of a dataset (Figure 1.1).
flowchart
Dataset --has many--> Split
Dataset --has many--> Subset
Subset --has many--> Split
Split == equivalent === d[Dataset object]
from datasets import load_dataset
ds = load_dataset("openslr/librispeech_asr", split="validation.clean", streaming=True)This is shorthand for
from datasets import load_dataset_builder
b = load_dataset_builder("openslr/librispeech_asr").as_streaming_dataset(split="validation.clean")