Venue and Date
The first Workshop on "Beyond English: Natural Language Processing for All Languages in an Era of Large Language Models" (GlobalNLP 2025) will be held at the RANLP 2025 conference in Varna, Bulgaria, on 11-13 September 2025.
Brief Technical Description
Natural Language Processing (NLP) has advanced dramatically with the introduction of Large Language Models (LLMs) and Generative AI, greatly enhancing text generation, machine translation, and knowledge retrieval for high-resource languages such as English, Chinese, German, Spanish, and French. However, a large proportion of the world's languages—ranging from low-resource (e.g., Indigenous, African, Indian languages and minority languages) to under-resource (e.g., Irish language), and medium-resource (e.g., Baltic, South Asian, and Slavic languages)—continue to face significant challenges due to data scarcity, linguistic complexity, and limited computing resources.
This workshop is dedicated to advancing NLP for all languages—high-resource, medium-resource, under-resourced, and low-resource alike. We aim to foster an inclusive environment that addresses the linguistic and technical needs of every language community, regardless of resource availability.
We encourage both technical and non-technical papers containing experimental, theoretical, or methodological contributions. We explicitly seek interdisciplinary proposals that focus on participatory methods to develop NLP. This workshop intends to examine creative strategies that bridge the NLP gap across all language categories, utilizing cutting-edge techniques such as (but are not limited to):
- Data-Efficient NLP: Transfer learning, few-shot and zero-shot techniques to overcome data limitations.
- Multilingual and Cross-Lingual Models: Approaches for training and adapting models to diverse linguistic structures, including morphologically rich and agglutinative languages.
- Semantic-Based Approaches: Ontology-based information extraction, semantic similarity, entity linking, and connection extraction.
- Linked Data and Knowledge Graphs: Structured knowledge for machine translation, information retrieval, and reasoning for underrepresented languages.
- Practical Applications: Real-world use in education, healthcare, climate action, government policy, and multilingual content creation.
- Corpus Creation and Linguistic Tools: Tools for building corpora, model development, evaluation metrics, and error analysis.
- Reusability of Linguistic Resources: In tasks such as machine translation, POS tagging, and syntactic parsing and others.
- Digital Humanities and Cultural Heritage: Computational approaches to historical texts, linguistic preservation, and the integration of NLP with humanities disciplines such as history, literature, and cultural studies.
- Practical Applications of Large Language Models: Applications of LLMs across diverse domains such as software engineering (code generation, bug detection, documentation), Internet of Things (natural language interfaces, device management), image processing (multimodal models combining text and visuals), and recommendation systems (personalized content, user intent modeling), enabling interdisciplinary communication and collaboration.
This workshop brings together academics, industry experts, and linguists to collaborate on making NLP more inclusive, equitable, and effective for all languages.
Target Audience
This workshop is designed for NLP researchers, linguists, industry experts, and AI practitioners working on language technologies, especially those in resource-constrained environments. The major goal is to bring together established and emerging scholars to discuss new ways to construct, optimize, and implement Large Language Models (LLMs) and other NLP techniques for low, mid, and underrepresented languages.
We anticipate between 20 to 40 participants, including academic researchers, industry executives, and students. Our contributors will come from universities, research institutes, technology businesses, and non-profit organizations that specialize in language technology development, linguistic resource generation, and computational modeling for languages.
Workshop Format
- Keynote talks by prominent researchers
- Paper presentations (oral and poster sessions)
- Panel discussion on multilingual LLM challenges
- Interactive demos and practical applications
- All accepted papers will be published in ACL proceedings
Organizing Committee
PROGRAM COMMITTEE
- Alexander Gelbukh (Instituto Politécnico Nacional, Mexico)
- Bidyut Kumar Patra (IIT BHU, India)
- Clarence Teo (Nanyang Technological University, Singapore)
- Gaurish Thakkar (University of Zagreb, Croatia)
- Helena Moniz (Universidade de Lisboa, Lisbon, Portugal)
- Idris Abdulmumin (DSFSI, University of Pretoria)
- Ibrahim Said Ahmad (Northeastern University)
- Juri Opitz (University of Zurich, Switzerland)
- Luan Thanh Nguyen (Vietnam National University Ho Chi Minh City, Vietnam)
- Marie-Aude Lefer (UCLouvain, Belgium)
- Mohammed Hasanuzzaman (Queen's University Belfast, UK)
- Moritz Schaeffer (Johannes Gutenberg University of Mainz, Germany)
- Muslim Jameel Sayed (Atlantic Technological University, Ireland)
- Pádraic Moran (University of Galway, Ireland)
- Paolo Rosso (Valencia Polytechnic University, Spain)
- Paul Buitelaar (University of Galway, Ireland)
- Soumik Mandal (NYU Tandon School of Engineering, USA)
- Surangika Ranathunga (Massey University, New Zealand)
- Uthayasanker Thayasivam (University of Moratuwa, Srilanka)
Important Dates
- Paper submission deadline:
22 July 2025
25 July 2025 –
Deadline Over. Submissions after this date will not be reviewed.
- Notification of acceptance: 10 August 2025
- Camera-ready versions deadline: 25 August 2025
- Camera-ready proceedings ready: 8 September 2025
- RANLP Conference: 8–10 September 2025 (Monday–Wednesday)
- Workshops and Shared Tasks: 11–13 September 2025 (Thursday–Saturday)
- GlobalNLP 2025 Workshop: 12 September 2025 (Friday)
-
Paper Submission link:
https://softconf.com/ranlp25/GlobalNLP2025/
Registration
Only one author per accepted paper is required to register and pay the applicable fee
for the paper to be included in the RANLP proceedings and subsequently published in
the ACL Anthology.
Please review the detailed registration categories, fees, and deadlines here:
https://ranlp.org/ranlp2025/index.php/fees-registration/
📅 GlobalNLP–RANLP 2025 Workshop
Beyond English: NLP for All Languages in an Era of LLMs
📍 RANLP 2025, Bulgaria — 12 September 2025 (Friday)
🕒 All times: Bulgaria local time)
09:00–09:45
Invited Talk 1
Prof. Dipti Misra Sharma
Professor Emeritus, IIIT Hyderabad, India
Talk Title: Multilingualism, LLMs and Machine Translation
Session 1: Corpora & Language Resources
- 09:45–10:00 → Towards the Creation of a Collao Quechua–Spanish Parallel Corpus Using Optical Character Recognition
Gian Carlo Orcotoma Mormontoy, Lida Leon Nuñez and Hugo Espetia Huamanga
- 10:00–10:15 → C A N C E R: Corpus for Accurate Non-English Cancer-related Educational Resources
Anika Harju, Asma Shakeel, Tiantian He, Tianqi Xu and Dr. Aaro Harju
- 10:15–10:30 → Quality Matters: Measuring the Effect of Human-Annotated Translation Quality on English-Slovak Machine Translation
Matúš Kleštinec and Daša Munková
- 10:30–10:45 → Automatic Animacy Classification for Latvian Nouns
Ralfs Brutāns and Jelke Bloem
☕ Coffee Break 11:00–11:15
Session 2: Multilinguality & LLM Evaluation
- 11:15–11:30 → Prompt Balance Matters: Understanding How Imbalanced Few-Shot Learning Affects Multilingual Sense Disambiguation in LLMs
Deshan Koshala Sumanathilaka, Nicholas Micallef and Julian Hough
- 11:30–11:45 → What Language(s) Does Aya-23 Think In? How Multilinguality Affects Internal Language Representations
Katharina A. T. T. Trinley, Toshiki Nakai, Tatiana Anikina and Tanja Baeumel
- 11:45–12:00 → Checklist Engineering Empowers Multilingual LLM Judges
Mohammad Ghiasvand Mohammadkhani and Hamid Beigy
- 12:00–12:15 → Identifying Contextual Triggers in Hate Speech Texts Using Explainable Large Language Models
Dheeraj Kodati and Bhuvana Sree Lakkireddy
- 12:15–12:30 → Fine-Grained Arabic Offensive Language Classification with Taxonomy, Sentiment, and Emotions
Natalia Vanetik, Marina Litvak and Chaya Liebeskind
🍴 Lunch Break 12:45–13:25 (40 minutes)
13:25–14:10
Invited Talk 2
Prof. Michael Madden
Established Professor, School of Computer Science, University of Galway, Ireland
Talk Title: Advances in Natural Language Processing and Machine Learning for Medicine
Session 3: Low-Resource Models & Embeddings
- 14:10–14:25 → Development of a Low-Cost Named Entity Recognition System for Odia Language using Deep Active Learning
Tusarkanta Dalai, Tapas Kumar Mishra, Pankaj Kumar Sa, Prithviraj Mohanty, Chittaranjan Swain and Ajit Kumar Nayak
- 14:25–14:40 → Non-Contextual BERT or FastText? A Comparative Analysis
Abhay Shanbhag, Suramya Jadhav, Amogh Thakurdesai, Ridhima Bhaskar Sinare and Raviraj Joshi
- 14:40–14:55 → GeistBERT: Breathing Life into German NLP
Raphael Scheible-Schmitt and Johann Frei
- 14:55–15:10 → A Study on the Language Independent Stemmer in the Indian Language IR
Siba Sankar Sahu and Sukomal Pal
- 15:10–15:25 → PortBERT: Navigating the Depths of Portuguese Language Models
Raphael Scheible-Schmitt, Henry He and Armando B. Mendes
☕ Coffee Break 15:25–15:40
Session 4: Clinical & Assistive NLP
- 15:40–15:55 → Kantika: A Knowledge-Radiant Framework for Dermatology QA using IR-CoT and RAPTOR-Augmented Retrieval
Deep Das, Vikram Mehrolia, Rahul Dixit and Rohit Kumar
- 15:55–16:10 → DRISHTI: Drug Recognition and Integrated System for Helping the Visually Impaired with Tag-based Identification
Sajeeb Das, Srijit Paul, Ucchas Muhury, Akib Jayed Islam, Dhruba Jyoti Barua, Sultanus Salehin and Prasun Datta
- 16:10–16:25 → From Pixels to Prompts: Evaluating ChatGPT-4o in Face Recognition, Age Estimation, and Gender Classification
Jashn Jain, Praveen Kumar Chandaliya and Dhruti P. Sharma
- 16:25–16:40 → FedCliMask: Context-Aware Federated Learning with Ontology-Guided Semantic Masking for Clinical NLP
Srijit Paul, Sajeeb Das, Ucchas Muhury, Akib Jayed Islam, Dhruba Jyoti Barua, Sultanus Salehin and Prasun Datta
Session 5: Evaluation, Sentiment & Stylistics
- 16:40–16:55 → Bootstrapping a Sentence-Level Corpus Quality Classifier for Web Text using Active Learning
Maximilian Bley, Thomas Eckart and Christopher Schröder
- 16:55–17:10 → Assessing the Accuracy of AI-Generated Idiom Translations
Marijana Gašparović, Marija Brala Vukanović and Marija Brkić Bakarić
- 17:10–17:25 → Spatio-Temporal Mechanism in Multilingual Sentiment Analysis
Adarsh Singh Jadon, Vivek Tiwari, Chittaranjan Swain and Deepak Kumar Dewangan
- 17:25–17:40 → Measuring Prosodic Richness in LLM-Generated Responses for Conversational Recommendation
Darshna Parmar and Pramit Mazumdar
Paper Submission
Authors are encouraged to submit their original research papers via the official RANLP 2025 submission portal. Please follow the provided guidelines carefully to ensure a smooth submission and review process.
Submission Guidelines
Submissions must follow the RANLP 2025 submission guidelines, using ACL-style templates (LaTeX or MS Word).
- Regular papers: Up to 8 pages (excluding references). Additional pages for references are allowed.
- Short papers: Up to 6 pages (excluding references). Additional pages for references are allowed.
- Poster/Demo papers: Up to 4 pages (excluding references). Additional pages for references are allowed.
Publication: Accepted papers will be included in the ACL Anthology.
Submission Portal: For paper templates and submission guidelines, please visit the official RANLP website: https://ranlp.org/ranlp2025/index.php/submissions/. To submit your paper, use the dedicated submission system here: https://softconf.com/ranlp25/GlobalNLP2025/.