Language is more than communicationโit’s memory, ancestry, and the soul of a people.
Across Colombia, dozens of Indigenous languages carry centuries of oral tradition, cosmology, and ecological wisdom. Yet many of these voices risk being lost. According to UNESCO, nearly 40% of the worldโs 6,700 spoken languages are endangered, and for many Indigenous communities in Latin America, the threat is not just cultural lossโitโs invisibility.
But what if artificial intelligence could help?
What if, instead of accelerating cultural homogenization, technology became a tool for revitalizing endangered languages?
A groundbreaking new study published in SN Computer Science (Salazar, Manrique, & Pereira Nunes, 2025) takes us one step closer to that future. The paperโtitled “Machine Translation Strategies for Low-Resource Colombian Indigenous Languages”โexplores how machine translation, particularly Transformer models and Transfer Learning, can be adapted to help preserve and translate Wayuunaiki and Nasa Yuwe, two Indigenous languages with minimal digital resources.
๐ Read the full paper on SpringerLink
The Research: Building Bridges With Data
The study addresses a profound challenge: how do we build machine translation models for languages with extremely limited datasets?
Using the state-of-the-art Transformer architectureโa deep learning model used in services like Google Translateโthe researchers trained neural networks to translate Spanish into Wayuunaiki and Nasa Yuwe. But unlike dominant languages, these Indigenous tongues lacked large bilingual corpora (text paired in two languages). So the team:
- Created custom datasets by collecting parallel texts.
- Tested shallower Transformer models (fewer encoder/decoder layers).
- Applied Transfer Learning, using pre-trained Spanish-English models and adapting them to Indigenous languages.
What They Found: Smaller Models, Bigger Promise
Despite the constraints, the results were promising:
- Shallower models performed better than deep ones. Simpler architectures yielded higher BLEU and chrF scores (standard translation quality metrics), suggesting that more is not always betterโespecially in low-resource contexts.
- Transfer Learning helpedโbut selectively. Pre-training with Spanish-English corpora sometimes improved outcomes, but not universally. The key takeaway? Every Indigenous language may require tailored approaches based on its unique structure, data availability, and linguistic features.
- Corpus creation is foundational. The researchers emphasized that translation quality is intrinsically linked to the size and quality of available datasets. Without serious efforts in documentation and digitization, no algorithmโhowever advancedโcan work miracles.
Why This Matters: More Than Just Words
This is not just a story about code and statistics. It’s about survival, dignity, and representation.
Wayuunaiki is spoken by the Wayuu people of the Guajira Peninsula. Nasa Yuwe, by the Nasa people in the southwest Andes of Colombia. Both are rich in metaphor, oral storytelling, and cultural worldview. Enabling their digital presence means:
- Educational equity: Supporting literacy and bilingual education in Indigenous communities.
- Cultural resilience: Allowing younger generations to learn and celebrate their heritage.
- Linguistic justice: Resisting the erasure of ancestral knowledge from the digital world.
The Role of Explorers, Educators, and Creators
At Luminous Photo Expeditions, we believe that storytelling is an act of preservation. As we document endangered rituals, languages, and traditions across continents, weโre reminded again and again: technology must be a companion to culture, not a replacement.
Whether you are a technologist, linguist, educator, or traveler, this paper is a call to action. It invites us to co-create solutions that bridge innovation and traditionโand to support Indigenous-led efforts in language preservation.
Final Reflections
This research is just the beginning. As Salazar and colleagues rightly note, no machine learning model can replace the human commitment required to preserve and cherish these languages.
But perhaps, in the quiet hum of an algorithm fine-tuned for Wayuunaiki, we hear something profound:
A future where ancestral voices are not forgotten, but amplified.
Cite this study:
Salazar, I., Manrique, R., & Pereira Nunes, B. (2025). Machine Translation Strategies for LowโResource Colombian Indigenous Languages. SN Computer Science. https://doi.org/10.1007/s42979-025-04255-z