When it comes to the goal of one million Welsh speakers, language technology has a key role to play.

This was recognised by the Welsh Government, however more needs to be done to boost our resources to create technology that caters to the Welsh language.

When I decided to specialise in working with minority languages to create usable technology, I knew this would be a tough task. Within the field of language technologies, there is a balance to be struck between policymakers, linguists and software developers, all of whom want to see their project succeed.

READ MORE: Duolingo partnership to support Welsh language learning

Consider the current status of the Welsh language. At the moment, if we include both first and second language speakers, we have about 884,300 here in Wales.

Of course there are Welsh speakers across the world due to colonialism, migration and other factors and so they count in our work to ensure technological rigour as well as inclusivity.

Each Welsh speaker has a gift to give policymakers, linguists and software developers: their voice.

In order to train algorithms specifically to recognise and work with the Welsh language, we need more voices. Diversity of voices is infinitely important so as to create tools that work for everyone.

I have often heard people complain that Welsh language speech recognition technology is not great and so people prefer to use an English version. Myself and language technologists here in Wales and beyond are working to give everyone the right to good speech recognition in Welsh.


How can you help? Donate your voice! Contributing to Mozilla’s Common Voice project is a great start. Mozilla is building an open source, multi-language dataset of voices that anyone can use to train speech-enabled applications.

It feels that large, publicly available voice datasets will foster innovation and healthy commercial competition in machine-learning based speech technology.

Common Voice’s multi-language dataset is already the largest publicly available voice dataset of its kind, but it’s not the only one. The Welsh dataset currently stands at a mere 110 hours of validated data, compared to 789 hours of Catalan data, for example, or 2,015 hours of English data.

We need a national push to get people contributing to this dataset. It is a simple process, reading sentences and recording your voice via the software. Once this data is acquired, it is validated and can be used by software developers to create exciting software solutions.

Speech recognition has many applications, from our home devices to translation software. It is changing the way people interact with their devices, homes, cars, and jobs. The technology allows us to talk to a computer or device that interprets what we're saying in order to respond to our question or command.

If we are working towards a future where we are working, socialising and living through the medium of Welsh, we need to have the right technological infrastructure to make it easy and accessible to all.

This would also make the language more accessible to learners or just people who are curious about it, allowing them to interact with a more realistic Welsh voice. It would be a great way to remove the stigma around Welsh being a language of the past or one that people find to be a laughing stock, both views I do not agree with and try to challenge daily.

The Welsh Government should continue to get people to give their voice and knowledge to projects such as Mozilla Common Voice. This collaborative effort, paired with cutting-edge technology will help Welsh thrive and contribute to the ever growing field of language technologies for minority languages.

If you value The National's journalism, help grow our team of reporters by becoming a subscriber.