#HackathonSomosNLP

Join the largest open-source Natural Language Processing hackathon in Spanish and Portuguese


There are 600M Spanish speakers and 265M Portuguese speakers in the world. Spanish and Portuguese are the main languages in 29 countries, each of them rich in culture. Although language models show ever-growing multilingual capabilities, are they truly multicultural?

Join the #HackathonSomosNLP, an international online competition whose main goal is to create diverse and open NLP resources for the languages of Ibero-America 🚀

Hackathon 2026

This year we are celebrating the fifth edition — curious about the results from previous years? Read on!


Winning projects

Hackathon SomosNLP 2025: Cultural Alignment

The top three corpora in the preferences challenge are:

  • 🥇 TralaleloTralala-MemeAlign
  • 🥈 IberoTales
  • 🥉 HoCV-COL

Finalist teams:

  • 👏 Comida Colombia + Ecuador
  • 👏 Cresia
  • 👏 Equipo LeIA
  • 👏 Falsos Amigos
  • 👏 Refranero Afro-Cubano
  • 👏 Sabiduría Popular Castellana
  • 👏 Think Paraguayo

Notable collective achievements:

  • 📚 INCLUDE: 38,000+ exam questions from 23 countries
  • 📚 BLEND: extension of the cultural-knowledge benchmark
  • 📚 ~1,000 stereotypes collected and validated

More information about the Hackathon 2025 projects

Hackathon SomosNLP 2024: #Somos600M

The three winning projects are:

  • 🥇 NoticIA: Clickbait News Summarization
  • 🥈 AsistenciaRefugiados: Legal assistance for refugees
  • 🥉 TraductorInclusivo: Text rewriting using inclusive language

And the community’s favorite project is:

  • 💛 AviaciónInteligente: Navigating the Colombian Aeronautical Regulations

Special mentions to the projects:

  • 👏 ThinkParaguayo: Discover Guaraní culture
  • 👏 LenguajeClaro: Administrative language simplification
  • 👏 BERTIN-ClimID: BERTIN-Base Climate-related text Identification

And to the corpora:

  • 📚 SMC: Spanish Medical Corpus
  • 📚 RecetasDeLaAbuel@: A recipe corpus from Spanish-speaking countries
  • 📚 LingComp_QA: An educational computational-linguistics corpus in Spanish
  • 📚 KUNTUR: Peruvian Political Constitution of 1993
  • 📚 Province identification and summaries from the Rural Spanish Oral and Sound Corpus

Hackathon SomosNLP 2023: LLMs in Spanish

In this second edition, more than 500 people from 30 countries participated and developed 22 projects and 3 published papers.

More information about Hackathon 2023

Hackathon SomosNLP 2022: NLP in Spanish

In the first edition, more than 500 people from 29 countries participated. Featured projects:

More information about Hackathon 2022


Published papers

The hackathon projects and the community’s collective achievements have led to the following papers:

  • Grandury, M., Aula-Blasco, J., Falcão, J., Fourrier, C., González, M., Martínez, G. & Santamaría, G., … (2025). La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin America, ACL Main.
  • Salazar, I., Fernández Burda, M., Bin Islam, S., Soltani Moakhar, A., Singh, S., Farestam, F., Romanou, A., … Grandury, M. … (2025). Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation, ICLR.
  • Grandury, M. (2024). The #Somos600M Project: Generating NLP resources that represent the diversity of the languages from LATAM, the Caribbean, and Spain. LatinX in AI (LXAI) Research Workshop @NAACL 2024.
  • Mayor-Rocher, M., Melero, N., Merino-Gómez, E., Grandury, M., Conde, J., & Reviriego, P. (2024). Evaluating large language models with tests of Spanish as a foreign language: Pass or fail?
  • Plaza, I., Melero, N., del Pozo, C., Conde, J., Reviriego, P., Mayor-Rocher, M., & Grandury, M. (2024). Spanish and LLM Benchmarks: Is MMLU lost in translation?
  • García-Ferrero, I., & Altuna, B. (2024). NoticIA: A Clickbait Article Summarization Dataset in Spanish. Procesamiento del Lenguaje Natural, 73, 191-207.
  • Huerta, G. & Zuñiga Rojas, G. (2024). Identificación de textos relacionados al cambio climático y sustentabilidad utilizando modelos de lenguaje preentrenados en español. LatinX in AI (LXAI) Research Workshop @NAACL 2024.
  • Morales-Garzón, A., Benel Ramirez, S., Tuco Casquino, G., A. Rocha, O., & Medina, A. (2024). Aprendiendo a cocinar de manera saludable con Large Language Models, Supervised Fine Tuning y Retrieval Augmented Generation. LatinX in AI (LXAI) Research Workshop @NAACL 2024.
  • Jair Bejarano Sepulveda, E., Nicolai Potes Patiño, H., Pineda Montoya, S., Ivan Rodriguez, F., Enrique Orduy, J., Stevens Traslaviña, D., Mauricio Rosales, A. & Nicolás Madrid, S. (2024). Towards Improved RAC Accessibility: Dataset and LLMs, approach to enhancing RAC accessibility. LatinX in AI (LXAI) Research Workshop @NAACL 2024.

Talks and workshops

Hackathon 2025

Talk by Selene BaezTalk by Alfonso AmayuelasTalk by Andrés Marafioti

Hackathon 2024

Talk by Elena González-BlancoTalk by Gabriel MartínTalk by Amanda Curry

Hackathon 2023

Fine-tuning large language modelsOffensive language detectionEvaluation with disagreement

Hackathon 2022

Event 04Event 05Event 06
See all events