MFSM: Chinese-English sentence alignment based on multi- feature self-attention mechanism fusion
MFSM: Chinese-English sentence alignment based on multi- feature self-attention mechanism fusion
Blog Article
Bilingual parallel corpora is a very important basic resource in the research field of natural language processing based on statistics.There are cross alignment and empty alignment in Chinese-English bilingual text, it is easy Sleeping Aids to affect the effect of Chinese-English sentence alignment.Therefore, we propose a novel Chinese-English sentence alignment method based on multi-feature self-attention mechanism fusion.First, the long features of Chinese-English bilingual sentences are integrated into the Glove word vector.Then bidirectional gated recurrent unit is used to encode the feature word vector to obtain more fine-grained sentence local information.
Second, the interactive attention mechanism is introduced to extract Syrups and Sauces global information in bilingual sentences to ensure the effective use of contextual semantic features.Finally, the Kuhn-Munkres (KM) algorithm is introduced on the basis of multi-layer perceptron, which can deal with non-monotonic aligned text and improve the generalization ability of the model.Experiments show that, the F index with the proposed method exceeds 90%, the proposed method can effectively improve the correct rate and recall rate of sentence alignment, and improve the construction efficiency of Chinese-English parallel corpora.