Abstract: With the development of the digital economy, a fully online method is used to obtain the customer's tax return-related text documents with the customer's authorization, and review the key information of the text documents. This business process relies heavily on labor, and the review of related written documents has become time consuming. In order to solve this problem, this paper proposes a fully intelligent document processing framework for tax return text files. This framework combines natural language processing (NLP) and Multi-SVM Classifiers based on traditional robotic process automation (RPA). Combined with a fully automatic parsing method, this method can realize the identification, classification and extraction of key text information of tax declaration text files, and provide corresponding query services to downstream users in the form of APIs. To further evaluate the effectiveness of this framework, extensive experimental verification was conducted on customer data of a financial institution. The SVM has the performance. Precision, Recall, and F1-value are 97.23%, 98.41%, and 96.42% respectively. The entire process takes less than 5 minutes for a single customer, and the precision rate is 98.12%, the results show that the framework has high commercial value and has basically reached commercial standards.
keywords: Robotic process automation, Multi-SVM Classifiers, NLP, Understanding unstructured tax documents