O‘ZBEK TILI MILLIY KORPUSI MATNLARINI DEPENDENCY PARSING USULLARI VA ALGORITMLARI

Authors

  • Botir Elov

Keywords:

dependency analysis, Universal Dependencies, UzbBERT, biaffine parser, Uzbek language syntax, graph-based model.

Abstract

This article investigates methods for automatic syntactic analysis of texts in
the Uzbek national corpus based on dependency parsing. First, the agglutinative properties of the
Uzbek language and the influence of free word order on dependency trees are considered
theoretically. Then, various parsers are compared - from traditional transition-based models to
graph-based and transformer architectures. For experiments, a tagged dataset of 20,000 sentences
was created from the Universal Dependencies (UD_Uzbek) tag system and the updated national
corpus. The biaffine graph-based parser based on the UzbBERT and XLM-R models showed a
result of F1=91.8%, outperforming existing approaches (F1≈88%) and the UDPipe base model
(F1≈85%). In particular, the developed model reduced errors in compound sentences by 27%. The
evaluation set included more than 5000 sentences, and the average reduction in dependency length
was 0.42 tokens, which led to a 3% F1 increase in the detection of parallel semantic roles. These
results confirm the effectiveness of the transition to modern neural parsers that are adapted to the
morphological complexity and free syntax of the Uzbek language

Downloads

Published

2025-07-10