Leyu on Hugging Face

Welcome to the official Leyu by gheero organization on Hugging Face!

Featured Datasets (Leyu Amharic Dialects)

About the Datasets

Our datasets are a specialized collection of speech audio focused on low-resource African languages, currently emphasizing dialects of Ethiopian local languages. Designed primarily for Speech-to-Text (STT) research, the corpus captures the unique phonetic nuances and rhythmic patterns of different dialects.

The audio was recorded in real-world environments by contributors using mobile devices, providing diverse acoustic conditions that help train robust models. Every recording undergoes rigorous manual review, where designated reviewers verify transcript alignment and audio clarity.

To support inclusive and representative AI systems, we prioritized demographic diversity across the collection:

gheero Blogs

Explore more about our work on low-resource languages, dialect research, and inclusive AI development: