Volume 16, no. 4Pages 61 - 70

Effective Practices of Using Spatial Models in Document Image Classification

O.A. Slavin, I.M. Janiszewski
This paper presents a new approach to modelling the structure of document images for classification tasks. Each of the document images is considered as a realization of a stochastic point process. Estimates of the properties of the point process are used to describe the document structure. The main objective of this paper is to determine the type of a new document using a nonparametric classification method. A method of classification of functional properties of point processes based on the concept of statistical depth is proposed. Practical issues of experimentation are considered. Modeling on real data showed the effectiveness of the proposed approach.
Full text
Keywords
documents with flexible structure; classification; spatial point process; reproducible point patterns; depth; DD-plot; alpha-procedure.
References
1. Slavin O.A. Using Special Text Points in the Recognition of Documents. Cyber-Physical Systems: Advances in Design and Modelling, 2020, pp. 43-53.
2. Chen Nawei., Blostein D. A Survey of Document Image Classification: Problem Statement, Classifier Architecture and Performance Evaluation. International Journal of Document Analysis and Recognition, 2007, vol. 10, pp. 1-16. DOI: 10.1007/s10032-006-0020-2
3. Li Liu, Zhiyu Wang, Taorong Qiu, Qiu Chen, Yue Lu, Ching Y. Suen. Document Image Classification: Progress Over Two Decades. Neurocomputing, 2021, vol. 453, pp. 223-240. DOI: 10.1016/j.neucom.2021.04.114.
4. Gaceb D., Eglin V., Lebourgeois F. Classification of Business Documents for Real-Time Application. Journal of Real-Time Image Processing, 2014, vol. 9, no. 2, pp. 329-345. DOI: 10.1007/s11554-011-0227-4
5. Pawlasova K., Dvovrak J. Supervised Nonparametric Classification in the Context of Replicated Point Patterns. Image Analysis and Stereology, 2022, vol. 41, no. 2, pp. 57-109. DOI: 10.5566/ias.2652
6. Illian J., Penttinen A., Stoyan H., Stoyan D. Statistical Analysis and Modelling of Spatial Point Patterns. Chichester, John Wiley and Sons, 2008.
7. Hahn U. A Studentized Permutation Test for the Comparison of Spatial Point Patterns. Journal of the American Statistical Association, 2012, vol. 107, pp. 754-764. DOI: 10.1080/01621459.2012.688463
8. Baddeley A., Turner R. Spatstat: an R Package for Analyzing Spatial Point Patterns. Journal of Statistical Software, 2005, vol. 12, no. 6, pp. 1-42. DOI: 10.18637/jss.v012.i06
9. Baddeley A., Rubak E., Turner R. Spatial Point Patterns: Methodology and Applications with R. Boca Raton, London, New York, CRC press, 2015.
10. Mahalanobis P.C. On the Generalized Distance in Statistics. National Institute of Science of India, 1936, vol. 2, no. 2, pp. 49-55.
11. Baillo A., Cuevas A., Fraiman R. Classification Methods for Functional Data. The Oxford Handbook of Functional Data Analysis, Oxford, Oxford University Press, 2010, pp. 259-297.
12. Mosler K., Mozharovskyi P. Fast DD-Classification of Functional Data. Statistical Papers 58, 2017, vol. 4, pp. 1055-1089. DOI: 10.1007/s00362-015-0738-3
13. Pokotylo O., Mozharovskyi P., Dyckerhoff R. Depth and Depth-Based Classification with R-Package Ddalpha. Journal of Statistical Software, 2019, vol. 91, no. 5, pp. 1-46. DOI: 10.18637/jss.v091.i05
14. Li Jun, Cuesta-Albertos J.A., Liu R.Y. DD-Classifier: Nonparametric Classification Procedure Based on DD-plot. Journal of the American Statistical Association, 2012, vol. 107, no. 498, pp. 737-753. DOI: 10.1080/01621459.2012.688462
15. Vardi Y., Cun-Hui Zhang. The Multivariate L_1-Median and Associated Data Depth. Proceedings of the National Academy of Sciences, 2000, vol. 97, no. 4, pp. 1423-1426. DOI: 10.1073/pnas.97.4.142
16. Zuo Yijun, Serfling R. General Notions of Statistical Depth Function. Annals of statistics, 2000, vol. 28, no. 2, pp. 461-482. DOI: 10.1214/aos/1016218226