There are 600M Spanish speakers and 265M Portuguese speakers in the world. Spanish and Portuguese are the main languages in 29 countries, each of them rich in culture. Although language models show ever-growing multilingual capabilities, are they truly multicultural?
Join the #HackathonSomosNLP, an international online competition whose main goal is to create diverse and open NLP resources for the languages of Ibero-America 🚀
This year we are celebrating the fifth edition — curious about the results from previous years? Read on!
Winning projects
Hackathon SomosNLP 2025: Cultural Alignment
The top three corpora in the preferences challenge are:
- 🥇 TralaleloTralala-MemeAlign
- 🥈 IberoTales
- 🥉 HoCV-COL
Finalist teams:
- 👏 Comida Colombia + Ecuador
- 👏 Cresia
- 👏 Equipo LeIA
- 👏 Falsos Amigos
- 👏 Refranero Afro-Cubano
- 👏 Sabiduría Popular Castellana
- 👏 Think Paraguayo
Notable collective achievements:
- 📚 INCLUDE: 38,000+ exam questions from 23 countries
- 📚 BLEND: extension of the cultural-knowledge benchmark
- 📚 ~1,000 stereotypes collected and validated
More information about the Hackathon 2025 projects
Hackathon SomosNLP 2024: #Somos600M
The three winning projects are:
- 🥇 NoticIA: Clickbait News Summarization
- 🥈 AsistenciaRefugiados: Legal assistance for refugees
- 🥉 TraductorInclusivo: Text rewriting using inclusive language
And the community’s favorite project is:
- 💛 AviaciónInteligente: Navigating the Colombian Aeronautical Regulations
Special mentions to the projects:
- 👏 ThinkParaguayo: Discover Guaraní culture
- 👏 LenguajeClaro: Administrative language simplification
- 👏 BERTIN-ClimID: BERTIN-Base Climate-related text Identification
And to the corpora:
- 📚 SMC: Spanish Medical Corpus
- 📚 RecetasDeLaAbuel@: A recipe corpus from Spanish-speaking countries
- 📚 LingComp_QA: An educational computational-linguistics corpus in Spanish
- 📚 KUNTUR: Peruvian Political Constitution of 1993
- 📚 Province identification and summaries from the Rural Spanish Oral and Sound Corpus
Hackathon SomosNLP 2023: LLMs in Spanish
In this second edition, more than 500 people from 30 countries participated and developed 22 projects and 3 published papers.
More information about Hackathon 2023
Hackathon SomosNLP 2022: NLP in Spanish
In the first edition, more than 500 people from 29 countries participated. Featured projects:
- 🥇 BiomedIA: a voice-to-voice biomedical Q&A system, which led to a paper presented at NAACL 2022 that won the Best Poster Presentation Award
- 🥈 Mexican Legal Model: a model used by the Supreme Court of Justice of Mexico
- 🥉 Gender Neutralization: rewriting texts in an inclusive way
- 💜 Sexism Detector: a contribution to removing sexist comments
More information about Hackathon 2022
Published papers
The hackathon projects and the community’s collective achievements have led to the following papers:
- Grandury, M., Aula-Blasco, J., Falcão, J., Fourrier, C., González, M., Martínez, G. & Santamaría, G., … (2025). La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin America, ACL Main.
- Salazar, I., Fernández Burda, M., Bin Islam, S., Soltani Moakhar, A., Singh, S., Farestam, F., Romanou, A., … Grandury, M. … (2025). Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation, ICLR.
- Grandury, M. (2024). The #Somos600M Project: Generating NLP resources that represent the diversity of the languages from LATAM, the Caribbean, and Spain. LatinX in AI (LXAI) Research Workshop @NAACL 2024.
- Mayor-Rocher, M., Melero, N., Merino-Gómez, E., Grandury, M., Conde, J., & Reviriego, P. (2024). Evaluating large language models with tests of Spanish as a foreign language: Pass or fail?
- Plaza, I., Melero, N., del Pozo, C., Conde, J., Reviriego, P., Mayor-Rocher, M., & Grandury, M. (2024). Spanish and LLM Benchmarks: Is MMLU lost in translation?
- García-Ferrero, I., & Altuna, B. (2024). NoticIA: A Clickbait Article Summarization Dataset in Spanish. Procesamiento del Lenguaje Natural, 73, 191-207.
- Huerta, G. & Zuñiga Rojas, G. (2024). Identificación de textos relacionados al cambio climático y sustentabilidad utilizando modelos de lenguaje preentrenados en español. LatinX in AI (LXAI) Research Workshop @NAACL 2024.
- Morales-Garzón, A., Benel Ramirez, S., Tuco Casquino, G., A. Rocha, O., & Medina, A. (2024). Aprendiendo a cocinar de manera saludable con Large Language Models, Supervised Fine Tuning y Retrieval Augmented Generation. LatinX in AI (LXAI) Research Workshop @NAACL 2024.
- Jair Bejarano Sepulveda, E., Nicolai Potes Patiño, H., Pineda Montoya, S., Ivan Rodriguez, F., Enrique Orduy, J., Stevens Traslaviña, D., Mauricio Rosales, A. & Nicolás Madrid, S. (2024). Towards Improved RAC Accessibility: Dataset and LLMs, approach to enhancing RAC accessibility. LatinX in AI (LXAI) Research Workshop @NAACL 2024.











