Analyzing the influence of hyperparameters on the efficiency of OCR model for pre-reform handwritten texts

Мұқаба

Дәйексөз келтіру

Толық мәтін

Ашық рұқсат Ашық рұқсат
Рұқсат жабық Рұқсат берілді
Рұқсат жабық Рұқсат ақылы немесе тек жазылушылар үшін

Аннотация

The article considers the influence of hyperparameters on the efficiency of models of optical handwriting recognition of pre-reform period on the example of handwritten reports of governors of the Yenisei province of the XIX century. A comparative analysis of model configurations with different architectural components, including normalization modules, feature extraction blocks and predictors, is carried out. Particular attention is paid to the role of input image resolution and the size of hidden layers in achieving an optimal balance between prediction accuracy and computational cost. The results obtained allow us to identify key parameters for the development of optical character recognition systems adapted to historical texts with non-standard orthography and complex structure. Prospects for further research include evaluating synthetic methods for extending training data and analyzing alternative architectures such as transformers.

Толық мәтін

Рұқсат жабық

Авторлар туралы

P. Sherstnev

Artificial Intelligence Center of Siberian Federal University

Хат алмасуға жауапты Автор.
Email: sherstpasha99@gmail.com
ORCID iD: 0000-0003-2816-9433
Ресей, Akademika Kirenskogo 26, k. 1, Krasnoyarsk, 660074

K. Kozhin

Artificial Intelligence Center of Siberian Federal University

Email: kozhin-sfu@yandex.ru
ORCID iD: 0009-0003-4966-2427
Ресей, Akademika Kirenskogo 26, k. 1, Krasnoyarsk, 660074

A. Pyataeva

Artificial Intelligence Center of Siberian Federal University

Email: anna4u@list.ru
ORCID iD: 0000-0002-0140-263X
Ресей, Akademika Kirenskogo 26, k. 1, Krasnoyarsk, 660074

Әдебиет тізімі

  1. Karatzas D., Gomez-Bigorda L., Nicolaou A., Ghosh S., Bagdanov A., Iwamura M. ICDAR 2015 robust reading competition, 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, 2015, IEEE, 2015, pp. 1156–1160. https://doi.org/10.1109/ICDAR.2015.7333942
  2. Lattner C. LLVM: An infrastructure for multi-stage optimization, Master’s Thesis, Urbana, IL: University of Illinois, 2002.
  3. de Campos T.E., Babu B.R., and Varma M. Character recognition in natural images, Proceedings of the Fourth International Conference on Computer Vision Theory and Applications – Vol. 2: VISAPP (VISIGRAPP 2009), SciTePress, 2009, pp. 273–280. https://doi.org/10.5220/0001770102730280
  4. Chammas E., Mokbel Ch., Likforman-Sulem L. Handwriting recognition of historical documents with few labeled data, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), Vienna, 2018, IEEE, 2018, pp. 43–48. https://doi.org/10.1109/das.2018.15
  5. Mohammed H., Jampour M. From detection to modelling: An end-to-end paleographic system for analysing historical handwriting styles, Document Analysis Systems. DAS 2024, Sfikas G. and Retsinas G., Eds., Lecture Notes in Computer Science, Cham: Springer, 2024, pp. 363–376. https://doi.org/10.1007/978-3-031-70442-0_22
  6. Galushko I.N. Correcting OCR recognition of the historical sources texts using fuzzy sets (on the example of an early 20th century newspaper), Istoricheskaya Informatika, 2023, no. 1, pp. 102–113. https://doi.org/10.7256/2585-7797.2023.1.40387
  7. Rogov A.A., Skabin A.V., Shterkel’ I.A. On deciphering handwritten historical documents, CEUR Workshop Proceedings, 2012.
  8. Yumasheva Yu.Yu. Automated handwriting recognition using artificial intelligence algorithms: Russian and foreign experience, Digital Orientalia, 2023, vol. 3, nos. 1–2, pp. 24–32. https://doi.org/10.31696/s278240120026084-5
  9. Li M., Lv T., Chen J., Cui L., Lu Yi., Florencio D., Zhang Ch., Li Zh., Wei F. TrOCR: Transformer-based optical character recognition with pre-trained models, arXiv Preprint, 2021. https://doi.org/10.48550/arXiv.2109.10282
  10. Coquenet D., Chatelain C., Paquet T. End-toend handwritten paragraph text recognition using a vertical attention network, IEEE Trans. Pattern Anal. Mach. Intell., 2023, vol. 45, no. 1, pp. 508–524. https://doi.org/10.1109/TPAMI.2022.3144899
  11. Baek Yo., Lee B., Han D., Yun S., Lee H. Character region awareness for text detection, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, 2019, IEEE, 2019, pp. 9357–9366. https://doi.org/10.1109/CVPR.2019.00959
  12. Zhou X., Yao C., Wen H., Wang Yu., Zhou Sh., He W., Liang J. EAST: An efficient and accurate scene text detector, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, IEEE, 2017, pp. 2642–2651. https://doi.org/10.1109/CVPR.2017.283
  13. Liao M., Wan Zh., Yao C., Chen K., Bai X. Real-time scene text detection with differentiable binarization, Proceedings of the AAAI Conference on Artificial Intelligence, 2020, vol. 34, no. 7, pp. 11474–11481. https://doi.org/10.1609/aaai.v34i07.6812
  14. Wang W., Xie E., Li X., Hou W., Lu T., Y G., Shao Sh. Shape robust text detection with progressive scale expansion network, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, 2019, IEEE, 2019, pp. 9336–9345. https://doi.org/10.1109/cvpr.2019.00956
  15. Baek J., Kim G., Lee J., Park S., Han D., Yun S., Oh S.J., Lee H. What is wrong with scene text recognition model comparisons? Dataset and model analysis, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 2019, IEEE, 2019, pp. 4714–4722. https://doi.org/10.1109/ICCV.2019.00481
  16. Smith R. An overview of the Tesseract OCR engine, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Curitiba, Brazil, 2007, IEEE, 2007, vol. 2, pp. 629–633. https://doi.org/10.1109/icdar.2007.4376991
  17. Brandt Skelbye M., Dannйlls D. OCR processing of Swedish historical newspapers using deep hybrid CNN–LSTM networks, Proceedings of the Conference Recent Advances in Natural Language Processing–Deep Learning for Natural Language Processing Methods and Applications, Shoumen, Bulgaria: INCOMA, 2021, pp. 190–198. https://doi.org/10.26615/978-954-452-072-4_023
  18. Wick C., Reul C., Puppe F. Improving OCR accuracy on early printed books using deep convolutional networks, arXiv Preprint, 2018. https://doi.org/10.48550/arXiv.1802.10033
  19. Lyu L., Koutraki M., Krickl M., Fetahu B. Neural OCR post-hoc correction of historical corpora, Trans. Assoc. Comput. Linguist., 2021, vol. 9, pp. 479–493. https://doi.org/10.1162/tacl_a_00379
  20. Shi B., Wang X., Lyu P., Yao C., Bai X. ASTER: An attentional scene text recognizer with flexible rectification, IEEE Trans. Pattern Anal. Mach. Intell., 2018, vol. 41, no. 9, pp. 2035–2048. https://doi.org/10.1109/TPAMI.2018.2848939
  21. Sun Z., Pan W., Luo X. Attention-based handwritten text recognition using CNN-BiLSTM architecture, Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), 2019.
  22. Luong T., Pham H., Manning Ch.D. Effective approaches to attention-based neural machine translation, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, 2015, Mаrquez L., Callison-Burch Ch., Su J., Eds., Association for Computational Linguistics, 2015, pp. 1412–1421. https://doi.org/10.18653/v1/d15-1166
  23. FromThePage: Collaborative transcription and OCR platform. https://www.fromthepage.com (cited January 15, 2025)
  24. Reports of the governors of the Yenisei province. https://fromthepage.sfu-kras.ru/lib/otchyoty-gubernatorov-eniseyskoy-gubernii (cited January 15, 2025)
  25. Kozhin K. Image labeling software for optical character recognition (Anno OCR), RF Certificate of State Registration of Software 2024684369, 2024.
  26. Mann H.B., Whitney D.R. On a test of whether one of two random variables is stochastically larger than the other, Ann. Math. Stat., 1947, vol. 18, no. 1, pp. 50–60. https://doi.org/10.1214/aoms/1177730491
  27. Zhu X. Sample size calculation for Mann–Whitney U test with five methods, International Journal of Clinical Trials, 2021, vol. 8, no. 3, pp. 184–190. https://doi.org/10.18203/2349-3259.ijct20212840
  28. Mokeyev A., Artemova E., Malkin P. StackMix and Blot augmentations for handwritten recognition using CTCLoss, arXiv Preprint, 2021. https://doi.org/10.48550/arXiv.2108.11667
  29. Fogel S., Averbuch-Elor H., Cohen S., Mazor S., Litman R. ScrabbleGAN: Semi-supervised varying length handwritten text generation, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, 2020, IEEE, 2020, pp. 4324–4333. https://doi.org/10.1109/CVPR42600.2020.00438

Қосымша файлдар

Қосымша файлдар
Әрекет
1. JATS XML
2. Fig. 1. Scan of a page from the report of the governor of Yenisei province for 1858.

Жүктеу (778KB)
3. Fig. 2. The interface of the Anno OCR program with a marked page of the report of the governor of the Yenisei province.

Жүктеу (1MB)
4. Fig. 3. Pareto front for different neural network configurations.

Жүктеу (1MB)
5. Fig. 4. Average number of hidden parameters of the model depending on the configuration.

Жүктеу (1MB)
6. Fig. 5. Average accuracy value depending on configuration.

Жүктеу (1MB)

© Russian Academy of Sciences, 2025