Each participating team will generate a corpus of instructions, train their LLM, and create a demo to share their great work with the community. This year the focus is on projects that represent the richness of Spanish and the diversity of Spanish-speaking people. As always, we encourage projects to have a social impact and be related to one of the UN’s Sustainable Development Goals. Thank you for participating! ✨
✅ Steps to participate
Participating in our hackathon and applying your knowledge to democratize NLP in Spanish is very simple, go for it!
Join ourDiscordcommunity (and introduce yourself in #intros!)
Create an account onHugging Faceand join theSomosNLPorganization.
Register onEventbrite, choose the “Hackathon (English)” ticket.
Create your team or join one (teams of 1 to 5 people). Teams must be registered in the #encuentra-equipo channel (more info in the channel’s README).
Create your instruction corpus and push it tohf.co/somosnlp. We recommend using the
distilabel
library. Check theexample notebookprovided by the Argilla team!Write theDataset Cardfor your dataset: inspect the dataset, evaluate, and mitigate biases.
Fine-tune an LLM (up to 7B) and push it tohf.co/somosnlp. You can train it directly from a Space in the HuggingFace hub usingautotrain (no-code)orjupyterlab. You can use T4s sponsored by HuggingFace. Remember that it’s very important to test on smaller machines to verify that the code is correct and not find errors after several hours of training.
Write theModel Cardfor your model: evaluate its quality, biases, and carbon footprint. Important: link the dataset used for training.
Create ademo Spaceto showcase your project to the community. You can use Nvidia T4 - small GPUs. Important: link the dataset(s) and model(s) used.
Submit your project by filling out thesubmission form. You can continue making modifications until 23h59 Anywhere on Earth on Sunday, April 10 (we will check the commit times 👀).
- Extra. You can present your project to theWorkshop LatinX in NLP @NAACL.
- Optionally, do a 5 min presentation of your project in front of the jury and the community.
Help us improve for the next edition by rating with stars different aspects of the hackathon in thismini feedback form. Thanks!
If you have any questions, we are at your disposal in the #pide-ayuda channel, write a descriptive title and select the “hackathon” tag.
We wish you much success! 🚀
👏 Evaluation and Prizes
🗓️ Important Dates
- April 10th 23:59Anywhere On Earth: Deadline tosubmit projectsto the Hackathon #Somos600M and theLatinX in NLP @NAACLWorkshop.
- April 11th: Live project presentations, 5 mins per team.
- April 18th: Announcement of the winning projects.
- Soon: Live presentation of the winning projects, 30 mins per team.
🏆 Benefits and Prizes
All participants 👏
- Access to PRO endpoints on Hugging Face for creating synthetic corpora.
- Access to GPUs with up to 25GB of RAM on Hugging Face for model training and demo.
- Access to “persistent storage” on Hugging Face for creating Argilla annotation spaces.
- Support to present your project at the LatinX in NLP @NAACL 2024 workshop, one of the most important international NLP conferences. Learn how inthis post. Also, the LatinX in AI team is available for questions!
Everyone that presents a project 🚀
- Certificate of participation or winning team of the “Hackathon #Somos600M 2024” (verified on our website).
- 20% discount for theWomenTech Global Conference 2024.
- Possibility of obtaining a completely free pass to attend the WomenTech Global Conference 2024 (let us know your interest in the project submission form).
- Possibility of receiving a nomination to joinNova(let us know your interest in the project submission form).
- Possibility to continue developing your project with our support, contact us!
3rd place team (prizes per person) 🥉
- Certificate, recognition on the website and social media, and honorary role in the Discord server.
- 20k credits from the MonsterAPI byQ Blocksfor LLM training.
2nd place team (prizes per person) 🥈
- Certificate, recognition on the website and social media, and honorary role in the Discord server.
- 30k credits from the MonsterAPI byQ Blocksfor LLM training.
1st place team (prizes per person) 🥇
- Certificate, recognition on the website and social media, and honorary role in the Discord server.
- 50k credits from the MonsterAPI byQ Blocksfor LLM training.
✅ Project Evaluation
A complete project consists of instruction corpora + model + demo. Likewise, given the hackathon’s focus on data, we also accept projects that have focused on corpus creation (maximum score: 7 points).
Corpus (4 points):
- Focus on linguistic varieties
- Correct corpus structure
- Corpus creation technique
- Clarity and reproducibility of scripts
- Completeness of the Dataset Card
- Corpus quality
Model (3 points):
- Training method used
- Clarity and reproducibility of scripts
- Completeness of the Model Card
- Model evaluation
Demo (1 point):
- Clarity and UX of the demo
Project and presentation (2 points):
- Motivation, originality, and social impact
- Clarity and quality of the presentation
Extra point:
- Each jury member can assign an extra point to a project that has particularly caught their attention.
❓ Frequently Asked Questions
Can I participate if I'm not from a Spanish-speaking country?
Absolutely! While the focus of the hackathon is on the Spanish language and its varieties, we welcome participants from all over the world. Diversity enriches the projects and the community!
Why should I participate?
By joining this hackathon, you will have the opportunity to:
- ✅ Understand how large language models (LLMs) work and discover the challenges of each stage of their development: corpus creation, training, and evaluation
- ✅ Participate in the creation of a quality and diverse corpus that includes the different varieties of Spanish and co-official languages (top as an experience and top for the CV)
- ✅ Resolve all your doubts about NLP during “Ask Me Anything” mentoring sessions
- ✅ Receive support to present your work in a paper
- ✅ Win prizes to continue growing as a professional and get a certificate
- ✅ Join the largest community of Spanish speakers who study, work, and research in NLP
What is the required level?
From the SomosNLP team, we want to encourage you to participate regardless of your current knowledge. In previous editions, we have had groups from research institutes and groups of undergraduate students, all projects add up!
We are at your disposal to help you in every step of the development of your project! Just post your question in the #pide-ayuda channel or ping us in your project’s thread.
What does the complexity of the projects depend on?
We will provide an example of how to create a dataset, train a model, and create a demo. It’s up to you and your team to choose how much to research and work to improve the base version. The difficulty also depends on the use case, the origin of the data, the time you dedicate to its curation, the training technique, the iterations you make, and how elaborate you want your demo to be. You are free to choose everything!
How long do I have to form a team?
Ideally all teams will be registered during the first week of the hackathon, until March 8th.
EDIT: We accept new teams until April 7th.
How can I find a team?
Finding a team is easy! Check the README of the #encuentra-equipo channel on our Discord server!
You have two options:
- 👀 Filter for posts from other participants who are “looking for people” and respond to them, OR
- 📢 Create a new thread specifying the topic you would like to work on
We encourage diversity in teams, including a mix of skills, experiences, and backgrounds. This diversity often leads to more innovative and comprehensive projects.
Can there be teams of 1 person?
Yes, we accept teams between 1 and 5 people.
How do you recommend organizing the work?
- Use your project channel on Discord to communicate and organize
- Since it’s an international hackathon, we recommend asynchronous communication or dividing the work and holding meetings with fewer people
- Schedule meetings or talk spontaneously using the new voice channels in the “SALAS DE REUNIÓN” (meeting rooms) category on Discord
- Pin important messages in the project channel, e.g., task allocation, day of the next meeting, … To pin a message, click on the three dots and select “Pin message”
- For better clarity, you can also create a shared document with team members to write down the project’s objective, allocate tasks, and more (and pin the link in the chat)
Good luck to all participants! 🌟