Analyzing the influence of hyperparameters on the efficiency of OCR model for pre-reform handwritten texts

P. A. Sherstnev; Шерстнев П. А.; K. D. Kozhin; Кожин К. Д.; A. V. Pyataeva; Пятаева А. В.

doi:10.31857/S0132347425030071

Analyzing the influence of hyperparameters on the efficiency of OCR model for pre-reform handwritten texts

Authors: Sherstnev P.A.¹, Kozhin K.D.¹, Pyataeva A.V.¹
Affiliations:
1. Artificial Intelligence Center of Siberian Federal University
Issue: No 3 (2025)
Pages: 70–79
Section: COMPUTER GRAFICS AND VISUALIZATION
URL: https://modernonco.orscience.ru/0132-3474/article/view/688124
DOI: https://doi.org/10.31857/S0132347425030071
EDN: https://elibrary.ru/GRLAPG
ID: 688124

Cite item

Full Text

Open Access
Restricted Access

Access granted
Restricted Access

Subscription or Fee Access

Abstract
Full Text
About the authors
References
Supplementary files
Statistics

Abstract

The article considers the influence of hyperparameters on the efficiency of models of optical handwriting recognition of pre-reform period on the example of handwritten reports of governors of the Yenisei province of the XIX century. A comparative analysis of model configurations with different architectural components, including normalization modules, feature extraction blocks and predictors, is carried out. Particular attention is paid to the role of input image resolution and the size of hidden layers in achieving an optimal balance between prediction accuracy and computational cost. The results obtained allow us to identify key parameters for the development of optical character recognition systems adapted to historical texts with non-standard orthography and complex structure. Prospects for further research include evaluating synthetic methods for extending training data and analyzing alternative architectures such as transformers.

Keywords

optical character recognition, hyperparameters, handwritten text recognition, pre-reform orthography, normalization modules, neural networks, historical documents, model architecture, accuracy, optimization

Full Text

References

Karatzas D., Gomez-Bigorda L., Nicolaou A., Ghosh S., Bagdanov A., Iwamura M. ICDAR 2015 robust reading competition, 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, 2015, IEEE, 2015, pp. 1156–1160. https://doi.org/10.1109/ICDAR.2015.7333942
Lattner C. LLVM: An infrastructure for multi-stage optimization, Master’s Thesis, Urbana, IL: University of Illinois, 2002.
de Campos T.E., Babu B.R., and Varma M. Character recognition in natural images, Proceedings of the Fourth International Conference on Computer Vision Theory and Applications – Vol. 2: VISAPP (VISIGRAPP 2009), SciTePress, 2009, pp. 273–280. https://doi.org/10.5220/0001770102730280
Chammas E., Mokbel Ch., Likforman-Sulem L. Handwriting recognition of historical documents with few labeled data, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), Vienna, 2018, IEEE, 2018, pp. 43–48. https://doi.org/10.1109/das.2018.15
Mohammed H., Jampour M. From detection to modelling: An end-to-end paleographic system for analysing historical handwriting styles, Document Analysis Systems. DAS 2024, Sfikas G. and Retsinas G., Eds., Lecture Notes in Computer Science, Cham: Springer, 2024, pp. 363–376. https://doi.org/10.1007/978-3-031-70442-0_22
Galushko I.N. Correcting OCR recognition of the historical sources texts using fuzzy sets (on the example of an early 20th century newspaper), Istoricheskaya Informatika, 2023, no. 1, pp. 102–113. https://doi.org/10.7256/2585-7797.2023.1.40387
Rogov A.A., Skabin A.V., Shterkel’ I.A. On deciphering handwritten historical documents, CEUR Workshop Proceedings, 2012.
Yumasheva Yu.Yu. Automated handwriting recognition using artificial intelligence algorithms: Russian and foreign experience, Digital Orientalia, 2023, vol. 3, nos. 1–2, pp. 24–32. https://doi.org/10.31696/s278240120026084-5
Li M., Lv T., Chen J., Cui L., Lu Yi., Florencio D., Zhang Ch., Li Zh., Wei F. TrOCR: Transformer-based optical character recognition with pre-trained models, arXiv Preprint, 2021. https://doi.org/10.48550/arXiv.2109.10282
Coquenet D., Chatelain C., Paquet T. End-toend handwritten paragraph text recognition using a vertical attention network, IEEE Trans. Pattern Anal. Mach. Intell., 2023, vol. 45, no. 1, pp. 508–524. https://doi.org/10.1109/TPAMI.2022.3144899
Baek Yo., Lee B., Han D., Yun S., Lee H. Character region awareness for text detection, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, 2019, IEEE, 2019, pp. 9357–9366. https://doi.org/10.1109/CVPR.2019.00959
Zhou X., Yao C., Wen H., Wang Yu., Zhou Sh., He W., Liang J. EAST: An efficient and accurate scene text detector, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, IEEE, 2017, pp. 2642–2651. https://doi.org/10.1109/CVPR.2017.283
Liao M., Wan Zh., Yao C., Chen K., Bai X. Real-time scene text detection with differentiable binarization, Proceedings of the AAAI Conference on Artificial Intelligence, 2020, vol. 34, no. 7, pp. 11474–11481. https://doi.org/10.1609/aaai.v34i07.6812
Wang W., Xie E., Li X., Hou W., Lu T., Y G., Shao Sh. Shape robust text detection with progressive scale expansion network, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, 2019, IEEE, 2019, pp. 9336–9345. https://doi.org/10.1109/cvpr.2019.00956
Baek J., Kim G., Lee J., Park S., Han D., Yun S., Oh S.J., Lee H. What is wrong with scene text recognition model comparisons? Dataset and model analysis, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 2019, IEEE, 2019, pp. 4714–4722. https://doi.org/10.1109/ICCV.2019.00481
Smith R. An overview of the Tesseract OCR engine, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Curitiba, Brazil, 2007, IEEE, 2007, vol. 2, pp. 629–633. https://doi.org/10.1109/icdar.2007.4376991
Brandt Skelbye M., Dannйlls D. OCR processing of Swedish historical newspapers using deep hybrid CNN–LSTM networks, Proceedings of the Conference Recent Advances in Natural Language Processing–Deep Learning for Natural Language Processing Methods and Applications, Shoumen, Bulgaria: INCOMA, 2021, pp. 190–198. https://doi.org/10.26615/978-954-452-072-4_023
Wick C., Reul C., Puppe F. Improving OCR accuracy on early printed books using deep convolutional networks, arXiv Preprint, 2018. https://doi.org/10.48550/arXiv.1802.10033
Lyu L., Koutraki M., Krickl M., Fetahu B. Neural OCR post-hoc correction of historical corpora, Trans. Assoc. Comput. Linguist., 2021, vol. 9, pp. 479–493. https://doi.org/10.1162/tacl_a_00379
Shi B., Wang X., Lyu P., Yao C., Bai X. ASTER: An attentional scene text recognizer with flexible rectification, IEEE Trans. Pattern Anal. Mach. Intell., 2018, vol. 41, no. 9, pp. 2035–2048. https://doi.org/10.1109/TPAMI.2018.2848939
Sun Z., Pan W., Luo X. Attention-based handwritten text recognition using CNN-BiLSTM architecture, Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), 2019.
Luong T., Pham H., Manning Ch.D. Effective approaches to attention-based neural machine translation, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, 2015, Mаrquez L., Callison-Burch Ch., Su J., Eds., Association for Computational Linguistics, 2015, pp. 1412–1421. https://doi.org/10.18653/v1/d15-1166
FromThePage: Collaborative transcription and OCR platform. https://www.fromthepage.com (cited January 15, 2025)
Reports of the governors of the Yenisei province. https://fromthepage.sfu-kras.ru/lib/otchyoty-gubernatorov-eniseyskoy-gubernii (cited January 15, 2025)
Kozhin K. Image labeling software for optical character recognition (Anno OCR), RF Certificate of State Registration of Software 2024684369, 2024.
Mann H.B., Whitney D.R. On a test of whether one of two random variables is stochastically larger than the other, Ann. Math. Stat., 1947, vol. 18, no. 1, pp. 50–60. https://doi.org/10.1214/aoms/1177730491
Zhu X. Sample size calculation for Mann–Whitney U test with five methods, International Journal of Clinical Trials, 2021, vol. 8, no. 3, pp. 184–190. https://doi.org/10.18203/2349-3259.ijct20212840
Mokeyev A., Artemova E., Malkin P. StackMix and Blot augmentations for handwritten recognition using CTCLoss, arXiv Preprint, 2021. https://doi.org/10.48550/arXiv.2108.11667
Fogel S., Averbuch-Elor H., Cohen S., Mazor S., Litman R. ScrabbleGAN: Semi-supervised varying length handwritten text generation, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, 2020, IEEE, 2020, pp. 4324–4333. https://doi.org/10.1109/CVPR42600.2020.00438