At the recent Radiodays Asia conference, Peter Lucas-Jones, CEO of Te Reo Irirangi o Te Hiku o Te Ika (Te Hiku Media), delivered an insightful presentation on the innovative use of Māori language radio broadcasts to train speech and language technologies.
The initiative is a significant step towards the preservation and revitalization of the Māori language, leveraging decades of radio broadcasts to develop large language learning models.
Peter Lucas-Jones, a seasoned Māori language broadcaster and digital content creator, is at the forefront of this project. He emphasized the importance of Indigenous data sovereignty and the role of natural language processing (NLP) in supporting Indigenous language revitalization.
The project, which has been in development for over 30 years, aims to protect and utilize the vast archive of Māori language radio broadcasts. “We are protectors of data. We have the largest archive in the tribal radio network.”
This archive serves as a crucial resource for training AI models, ensuring that the language and culture are preserved for future generations. The LLM training is kept inside a closed system rather than being widely available. “Our land was already stolen, we don’t want our language stolen by big data companies,” he said.
One of the key achievements of Te Hiku Media is the development of a Māori language speech-to-text system with an impressive error rate of only 8%. The system opens up a range of application opportunities, from transcribing archival audio recordings to developing speech interfaces for computers.
The project also emphasizes the importance of community involvement. In 2014, Te Hiku Media started video streaming and created online consumer groups for their content. “More than 80% of tribal members live outside our tribal area, but we find that people want to be connected with the content from their tribal area. People are using devices to consume our content every day everywhere,” Lucas-Jones explained.
Te Hiku Media’s efforts have not gone unnoticed. The Hawaiian communityis now fine-tuning their indigenous language speech-to-text model based on Te Hiku Media’s experience. This cross-cultural collaboration highlights the broader impact of the project on Indigenous language preservation worldwide.
Lucas-Jones also highlighted the challenges faced by the Māori language due to historical oppression. “When we first started to work with elders, we discovered that the original language was beaten out of them. It had a devastating effect on the language, and tribal radio is helping to revive those languages.” By transcribing early archives, Te Hiku Media aims to save and make available the language before it was adversely affected by colonisation.
The project has received support from various quarters, including Nvidia, which supplied GPUs to set up the AI environment. This support has enabled Te Hiku Media to keep the project within their community, ensuring that the data remains sovereign. “We are not teaching other computers to speak Māori… we are keeping it at home.”
The initiative has also led to the development of synthetic bilingual voices, further enhancing the accessibility and usability of the Māori language.
“We don’t want our sovereign languages sold back to our grandchildren as data as a service. Our land was already stolen; we don’t want our language stolen by big data companies. Data is land,” Lucas-Jones asserted.
The use of Māori language radio broadcasts to train AI models represents a significant advancement in the preservation and revitalization of the Māori language. Peter Lucas-Jones and his team at Te Hiku Media have demonstrated the power of community-driven initiatives in leveraging technology for cultural preservation.
Lucas-Jones concluded: “Language is the key to culture. It springs from the life and the landscape. It is often imbued with philosophical memory; it is the ideal vehicle to transmit culture.”

Lucas-Jones has been named one of Time Magazine’s top 100 people in Ai Tech.

