Research

Language technology
Corpus Linguistics
Lexicography / Valency / Cooccurrence phenomena
Knowledge Resources / Artificial Intelligence

I work in the area of language technology.

I am a member in the Interdisciplinary Centre for Research on Lexicography, Valency and Collocation <! – and in the Interdisciplinary Centre for Digital humanities –> at the Friedrich-Alexander University of Erlangen-Nuremberg.

Also I am a member in The Association for Computational Linguistics (ACL), in The European Language Resources Association (ELRA), in The European Association for Lexicography (EURALEX), in The European Association for Terminology (EAFT), and in The German Society for Computational Linguistics and Language Technology (GSCL).

I have an interest in Human Language Processing and Technology, Computational Linguistics, Corpus Linguistics, Language Resources, and Digital Humanities. So far, I have been active in the following areas: Tokenization, Computational Morphology, Elecrtonic Lexicography, Part-of-speech Tagging, Syntax (Valency), Sentiment Analysis, Semantic Similarity, Implicit Emotion Recognition, and Cooccurrence Phenomena (Collocation).

Recurring key aspects of my research are:

Fundamentals: Development of fundamental linguistic resources such as a part-of-speech tagsets, e.g. for Albanian, Language Corpora and Lexicons.
Methodology: Development of methods and techniques such as for Sentiment Analysis, Semantic Similarity, and such for Translation Inference Across Dictionaries.

Language technology

Key publications

Kabashi, Besim, and Thomas Proisl. 2018. “Albanian Part-of-Speech Tagging: Gold Standard and Evaluation.” In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), edited by Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, and Takenobu Tokunaga, 2593–9. Miyazaki, Japan: European Language Resources Association. http://www.lrec-conf.org/proceedings/lrec2018/pdf/89.pdf. [pdf, bib]
Kabashi, Besim. 2015. Automatische Verarbeitung der Morphologie des Albanischen. 1st ed. Erlangen, Germany: FAU University Press. [pdf, bib]
Handl, Johannes, Besim Kabashi, Thomas Proisl, and Carsten Weber. 2009. “JSLIM – Computational Morphology in the Framework of the SLIM Theory of Language.” In State of the Art in Computational Morphology. Workshop on Systems and Frameworks for Computational Morphology (SFCM 2009), edited by Cerstin Mahlow and Michael Piotrowski, 10–27. Berlin, Heidelberg, New York: Springer. https://doi.org/10.1007/978-3-642-04131-0_2. [pdf, bib]
Kabashi, Besim. 2009. “Das Albanische Alphabet aus sprachtechnologischer Sicht.” In Der Kongress von Manastir. Herausforderung zwischen Tradition und Neuerung in der albanischen Schriftkultur., edited by Bardhyl Demiraj, 189–227. PHILOLOGIA - Sprachwissenschaftliche Forschungsergebnisse. Hamburg, Germany: Dr. Kovač. http://www.verlagdrkovac.de/3-8300-4705-3.htm. [pdf, bib]

Corpus Linguistics

Key publications

Proisl, Thomas, Natalie Dykes, Philipp Heinrich, Besim Kabashi, Andreas Blombach, and Stefan Evert. 2020. “EmpiriST Corpus 2.0: Adding Manual Normalization, Lemmatization and Semantic Tagging to a German Web and CMC Corpus.” In Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC 2020), edited by Nicoletta Calzolari, Sara Goggi, and Hélène Mazo, 6144–50. Marseille, France: European Language Resources Association. http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.754.pdf. [pdf, bib].
Blombach, Andreas, Natalie Dykes, Philipp Heinrich, Besim Kabashi, and Thomas Proisl. 2020. “A Corpus of German Reddit Exchanges (GeRedE).” In Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC 2020), edited by Nicoletta Calzolari, Sara Goggi, and Hélène Mazo, 6312–8. Marseille, France: European Language Resources Association. http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.774.pdf. [pdf, bib].
Kabashi, Besim. 2017. “AlCo – një korpus tekstesh i gjuhës shqipe me njëqind milionë fjalë.” Seminari Ndërkombëtar për Gjuhën, Letërsinë dhe Kulturën Shqiptare, no. 36: 123–32. [pdf, bib]

Lexicography / Valency / Cooccurrence phenomena

Key publications

Kabashi, Besim. 2019. “Collecting Collocations for the Albanian Language.” In Proceedings of the Sixth Biennial Conference on Electronic Lexicography: Electronic Lexicography in the 21st Century (eLex 2019), Sintra, Portugal, October 1–3, 2019., edited by Zingano Kuhn Kosem I., 478–89. Brno, Czech Republic: Lexical Computing, s.r.o. https://elex.link/elex2019/wp-content/uploads/2019/09/eLex_2019_27.pdf. [pdf, bib]
Kabashi, Besim. 2018. “A Lexicon of Albanian for Natural Language Processing.” Lexicographica 34 (1): 239–48. https://doi.org/10.1515/lex-2018-340112. [DOI, bib]
Kabashi, Besim. 2007. “Pronominal Clitics and Valency in Albanian. A Computational Linguistics Prespective and Modelling Within the LAG-Framework.” In Valency. Theoretical, Descriptive and Cognitive Issues., edited by Herbst Thomas; Götz-Votteler Katrin, 187:339–52. Trends in Linguistics. Studies and Monographs. Berlin, Germany / New York, USA: Mouton de Gruyter. https://doi.org/10.1515/9783110198775.4.339. [pdf, bib]

Knowledge Resources / Artificial Intelligence

Key publications

Zilio, Leonardo, and Besim Kabashi. 2024. “Using Neural Machine Translation for Normalising Historical Documents.” In Proceedings of the XXI EURALEX International Congress: Lexicography and Semantics, 827–39. Dubrovnik Cavtat, Croatia: Institut za hrvatski jezik. http://euralex.org/wp-content/themes/euralex/proceedings/Euralex 2024/EURALEX2024_Pr_p827-839_Zilio-Kabashi.pdf.pdf. [pdf, bib]
Gracia, Jorge, Besim Kabashi, Ilan Kernerman, Marta Lanau-Coronas, and Dorielle Lonke. 2019. “Results of the Translation Inference Across Dictionaries 2019 Shared Task.” In Translation Inference Across Dictionaries 2019 Shared Task, edited by Jorge Gracia, Besim Kabashi, and Ilan Kernerman. Co-located with the 2nd Language, Data; Knowledge Conference (LDK 2019), Leipzig, Germany, May 20, 2019. http://ceur-ws.org/Vol-2493/summary.pdf. [pdf, bib]
Proisl, Thomas, Philipp Heinrich, Besim Kabashi, and Stefan Evert. 2018. “EmotiKLUE at IEST 2018: Topic-Informed Classification of Implicit Emotions.” In Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, edited by Alexandra Balahur, Saif M. Mohammad, Veronique Hoste, and Roman Klinger, 235–42. Brussels, Belgium: Association for Computational Linguistics. http://aclweb.org/anthology/W18-6234. [pdf, bib]
Proisl, Thomas, Philipp Heinrich, Stefan Evert, and Besim Kabashi. 2017. “Translation Inference Across Dictionaries via a Combination of Graph-Based Methods and Co-Occurrence Statistics.” In Proceedings of the LDK 2017 Workshops: 1st Workshop on the OntoLex Model (OntoLex-2017), Shared Task on Translation Inference Across Dictionaries & Challenges for Wordnets, edited by John P. McCrae, Francis Bond, Paul Buitelaar, Philipp Cimiano, Thierry Declerck, Jorge Gracia, Ilan Kernerman, Elena Montiel-Ponsoda, Noam Ordan, and Maciej Piasecki, 94–102. Galway, Ireland: CEUR-WS.org. http://ceur-ws.org/Vol-1899/TIAD17_paper_1.pdf. [pdf, bib]
Proisl, Thomas, Stefan Evert, Paul Greiner, and Besim Kabashi. 2014. “SemantiKLUE: Robust Semantic Similarity at Multiple Levels Using Maximum Weight Matching.” In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), edited by Preslav Nakov and Torsten Zesch, 532–40. Dublin, Ireland: Association for Computational Linguistics. http://www.aclweb.org/anthology/S14-2093. [pdf, bib]