Mozilla is helping build Luganda voice technology (for free) via its Common Voice project. Anyone can go to the Common Voice website and record sentences for the project. We need as many speakers and accents as possible in order to create robust technologies. Donate your voice now.
There exists no open speech datasets for local languages. The few existing resources are copyrighted. That stifles research and innovation in developing voice technologies for local languages.
Luganda has been launched on Common Voice, a project to help make Luganda voice recognition technology open and accessible to everyone. The Common Voice platform encourages contributions from various local community members. This will result in the largest representative (age, gender, accents, etc) voice dataset that anyone can use to build innovative voice technology solutions which can work for every Luganda speaker.
What is Common Voice?
Common Voice is Mozilla’s attempt to help to democratize speech technology. It has been launched to help address exactly these biases and subsequent inequalities in technology accessibility. Since launching in 2017 we've made unparalleled progress in terms of language representation: There's no comparable initiative nor any open (CC0) dataset that includes as many (importantly, also under-resourced and under-served) languages, making it the largest multilingual public domain voice dataset.
Common Voice Community Work Flow
How we did it ?
We took two main steps to add Luganda to the Common Voice project:
Translate the user interface and information into Luganda
Collect text sentences which users will read out-loud
Donate Your Voice!
In order for quality technologies to be created for the Luganda language, we need more voices!
Anyone can record and donate sentences for the Common Voice project, and the more voices we get, the more accurate the technology becomes.
Common Voice sentences must be under the Creative Commons license CC-0. These are difficult to find because most publicly available sentences are copyrighted. We got our initial batch of 5,000 sentences from play scripts, classroom sessions and others were created by linguistics from the Department of languages. These sentences were validated based on Mozilla’s sentence criteria before they were uploaded to Common Voice. Currently, we have 30,000 Luganda text sentences available on Common Voice.
The Makerere NLP Club Voice Mobilizers are reaching out to different communities like Universities, churches, schools creating awareness about CV. These members are encouraged to frequently read and validate sentences on Common Voice.
Common Voice interface to either read or validate donated voices
Reading a Luganda sentence on the Common Voice Platform
Validating donated voices
Diversity in both the text corpus and voice collections is important for Common Voice, because good speech technologies should recognize the speech of people speaking about different topics irrespective of their accents, age and gender.
In order to translate the user interface into Luganda, a team of contributors worked together to ensure that the translations are natural-sounding and accurate. This was done using the Pontoon System.
Pontoon System used to translate Common Voice Platform to Luganda.
Project Team and collaborators:
-Ronald Ogwang-Jeremy Francis Tusubira-Jonathan Mukiibi-Dr. Andrew Katumba-Dr. Joyce Nakatumba-Nabende
-Makerere University NLP Club-Department of African Languages, Makerere University-Mozilla-FAIRFORWARD
Current Progress of Luganda on the Common Voice Platform