Natural Language Processing Research Group

BUKNLP is a Research Group for Natural Language Processing and Machine Learning at Bayero University, Kano-Nigeria. The research group consists of academic researchers from computer science, linguistic, and students

The group research activities include sentiment analysis, social media analysis, machine translation, Computational Social Science, information retrieval, textual analysis, multilingual natural language processing as well as the creation of linguistic resources (dictionaries and annotated corpora) for applications of various types. Recently, the group focused on natural language processing for low-resource languages and related task.

Research Areas

Natural Language Processing

Machine Learning

HausaNLP

Meet the Team

Researchers

Bello Shehu Bello

Lecturer in Computer Science

Machine Learning, Social Media Analysis, Natural Language Processing, Computational Social Science

Ibrahim Said Ahmad

Lecturer in Information Technology

Data Mining, Machine Learning, Sentiment Analysis

Jaafar Zubairu Maitama

Lecturer in Computer Science

Natural Language Processing, Summarization, Machine learning, Sentiment analysis

Mahmud Yusuf Ahmad

Lecturer in Computer Science

Data mining, Machine Learning, Learning Analytics, Big data

Shamsuddeen Hassan Muhammad

Lecturer in Computer Science

Sentimemnt Analysis, Machine Learning, Data Science, Low-resource NLP

Suhail Kamal

Lecturer in Information Technology

Sign Language Recognition, Sign Language Translation, Machine Translation

Collaborators

Lecturer in Computer Science

Machine Translation, Natural Language Processing

Ahamdu Shehu

Assistant Professor of English and Literature at the American University of Nigeria, Yola

Cognitive Linguistics, Cultural Linguistics, Structural aspects of African languages

Idris Abdulmuminu

Lecturer in Computer Science at Ahmadu Bello University, Zaria (ABU)

Neural Machine Translation, Low Resource Languages

Featured Publications

Recent Publications

Quickly discover relevant content by filtering publications.

Wilhelmina Nekoto, Vukosi Marivate, Tshinondiwa Matsila, Timi Fasubaa, Tajudeen Kolawole, Taiwo Fagbohungbe, Solomon Oluwole Akinola, Shamsuddee Hassan Muhammad, Salomon Kabongo, Salomey Osei, Sackey Freshia, Rubungo Andre Niyongabo, Ricky Macharm, Perez Ogayo, Orevaoghene Ahia, Musie Meressa, Mofe Adeyemi, Masabata Mokgesi-Selinga, Lawrence Okegbemi, Laura Jane Martinus, Kolawole Tajudeen, Kevin Degila, Kelechi Ogueji, Kathleen Siminyu, Julia Kreutzer

October 2020

Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages

Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. “Low-resourced”-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages. As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released under https://github.com/masakhane-io/masakhane-mt.

PDF

Idris Abdulmumin, Bashir Shehu Galadanci, Abubakar Isa

June 2020

Using Self-Training to Improve Back-Translation in Low Resource Neural Machine Translation

Improving neural machine translation (NMT) models using the back-translations of the monolingual target data (synthetic parallel data) is currently the state-of-the-art approach for training improved translation systems. The quality of the backward system - which is trained on the available parallel data and used for the back-translation - has been shown in many studies to affect the performance of the final NMT model. In low resource conditions, the available parallel data is usually not enough to train a backward model that can produce the qualitative synthetic data needed to train a standard translation model. This work proposes a self-training strategy where the output of the backward model is used to improve the model itself through the forward translation technique. The technique was shown to improve baseline low resource IWSLT'14 English-German and IWSLT'15 English-Vietnamese backward translation models by 11.06 and 1.5 BLEUs respectively. The synthetic data generated by the improved English-German backward model was used to train a forward model which out-performed another forward model trained using standard back-translation by 2.7 BLEU.

PDF

Ibrahim Said Ahmad, Azuraliza Abu Bakar, Mohd Ridzwan Yaakub, Shamsuddeen Hassan Muhammad

January 2020 SN Computer Science

A survey on machine learning techniques in movie revenue prediction

Ibrahim Said Ahmad, Azuraliza Abu Bakar, Mohd Ridzwan Yaakub, Mohammad Darwich

January 2020 International Journal of Advanced Computer Science and Applications(IJACSA)

Beyond Sentiment Classification: A Novel Approach for Utilizing Social Media Data for Business Intelligence

Shamsuddeen Hassan Muhammad, Pavel Brazdil, Alı́pio Jorge

January 2020 European Conference on Information Retrieval

Incremental Approach for Automatic Generation of Domain-Specific Sentiment Lexicon

See all publications

Projects

HausaNLP

This project aims to develop Hausa language resource for natural language processing task such as Hausa Social Media Corpus, Hausa Sentiment Lexicon , HausaNER , and POS.

Join Us?

We are always open for collaboration with motivated researchers and students with passion in our relevant research interest.

shmuhammad.csc@buk.edu.ng
+2348039647291
Faculty of Computer Science and Information Technology, Bayero University, Kano,