Articles | Open Access | https://doi.org/10.37547/ijll/Volume05Issue10-36

Linguistic Issues Of Search Formation Of Units Of The Uzbek Language Based On Linguistic Tags

Yuldashev Aziz Uyg‘un o‘g‘li , Teacher at TashDO'TAU, Uzbekistan

Abstract

This article covers the creation of the corpus of the Uzbek language and the creation of a search system based on linguistic tags for its effective use. First of all, various linguistic units in the scope of search - word form, lemma, syntactic unit, collocation, phrase and grammatical constructions were analyzed. Also, the interface elements, filters, methods of displaying results and statistical indicators used in the corpus search are described. The article shows the main types of corpus tagging - lexical (POS), morphological, syntactic and semantic tags, and their effect on search capabilities. At the same time, linguistic search methods through n-gram analysis, collocation detection, and regular expression (regex) are widely covered in the corpus. These approaches provide an opportunity to effectively use the corpus of the Uzbek language in the fields of scientific research, language teaching and automatic language processing.

Keywords

Uzbek language corpus, linguistic tags, search engine, lemmatization

References

Bober, N., Kapranov, Y., Kukarina, A., & Tron, T. (2021). British National Corpus in English language teaching of university students.

Sharipov, M., Mattiev, J., Sobirov, J., & Baltayev, R. (2022). Creating a morphological and syntactic tagged corpus for the Uzbek language. arXiv preprint arXiv:2210.15234.

Bobojonova, L., Akhundjanova, A., Ostheimer, P., & Fellenz, S. (2025). BBPOS: BERT-based Part-of-Speech Tagging for Uzbek. arXiv preprint arXiv:2501.10107.

Xudayberganov, N. (2024). O‘zbek tili korpusiga morfologik ishlov berish. Computer linguistics: problems, solutions, prospects, 1(1).

Sharipov, M., Mattiev, J., Sobirov, J., & Baltayev, R. (2022). Creating a morphological and syntactic tagged corpus for the Uzbek language. arXiv preprint arXiv:2210.15234.

Elov, B., & Ahmedova, M. (2024). N-gramlar asosida imloni tuzatish tizimini ishlab chiqish. Uzbekistan: Language and Culture, 3(3).

Rasulov Z.I. Tilshunoslikning zamonaviy yo‘nalishlari. moduli bo‘yicha o‘quv-uslubiy majmua. – Buxoro, 2025.

Madatov, K., Bekchanov, S., & Vičič, J. (2022). Dataset of stopwords extracted from Uzbek texts. Data in Brief, 43, 108351.

Smith, G. (2003). Searching for morphological structure with regular expressions. Tiger Projektbericht, Univ. Potsdam.

Avgustinova, T., & Zhang, Y. (2009, September). Exploiting the Russian national corpus in the development of a Russian Resource Grammar. In Proceedings of the workshop on adaptation of language resources and technology to new domains (pp. 1-11).

Xolmo‘minovna, A.O. (2022, September). Morphological Annotation System in The Corpus of Internet Information Texts in The Uzbek Language. In 2022 7th International Conference on Computer Science and Engineering (UBMK) (pp. 154-158). IEEE.

https://kunansy.github.io/RNC/

http://web-corpora.net/

https://www.sketchengine.eu/glossary/mi-score/

Article Statistics

Copyright License

Download Citations

How to Cite

Yuldashev Aziz Uyg‘un o‘g‘li. (2025). Linguistic Issues Of Search Formation Of Units Of The Uzbek Language Based On Linguistic Tags. International Journal Of Literature And Languages, 5(10), 163–170. https://doi.org/10.37547/ijll/Volume05Issue10-36