A Study of the Assessment of Translations and Post-editing in Neural Machine Translation

doi:10.12002/j.bisu.355

Abstract

Abstract:

Although great progress has been made in neural machine translation technology and neural machine translation systems are developing extremely quickly in terms of practicality and commercialization, their performance in the vertical field remains unsatisfactory. In this study, the research object has been the English-Chinese translation of military texts conducted by the mainstream machine translation systems in China and overseas. The average of BLEU scores for the five machine translation systems — Google, Baidu, Tencent, NetEase Youdao, and Sogou — is only 20.854 for the 1 000 test datasets in an independently constructed military corpus, which is 6.62 BLEU lower than the outcome for the general corpus. The study’s results indicate that, of the 5 050 errors in the 15 categories of spelling, vocabulary, syntax and semantics, errors in the translation of military terms accounted for the greatest proportion (42.83%), followed by errors in the translation of common words and hierarchy errors. The results show that the existing neural machine translation systems cannot achieve high-quality translations of military texts to meet the actual needs; thus, more extensive post-editing research to improve the accuracy of military text translations is an urgent requirement.

Keywords: neural machine translation; assessment of translations; post-editing; military texts; types of translation errors

摘要：

尽管神经机器翻译技术已经取得了巨大进步,业界也正在加速推进神经机器翻译系统实用化和商品化的进程,但其在垂直领域的表现还不尽如人意。本研究以军事领域英译汉文本为研究对象,这些译文均由国内外主流神经机器翻译系统完成,在自主构建的1 000个军事题材译文测试数据集中,谷歌、百度、腾讯、有道、搜狗5个翻译系统的BLEU均值仅为20.854,较之其通用语料译文BLEU值相差6.62。实验结果显示：在军事题材译文的拼写、词汇、句法和语义4大类15种共5 050处错误中,军事术语翻译错误占比最高,为42.83%;其次为普通词语误译和层级结构错误。实验结果表明,目前现有的神经机器翻译系统尚不能实现高质量的军事文本翻译,无法满足现实需求,亟需进行译后编辑研究,以提高军事文本翻译的准确率。

关键词: 神经机器翻译, 译文评测, 译后编辑, 军事文本, 翻译错误类型

CLC Number:

H085

Guo Wanghao, Hu Fumao. A Study of the Assessment of Translations and Post-editing in Neural Machine Translation[J]. Journal of Beijing International Studies University, 2021, 43(5): 66-82.

郭望皓, 胡富茂. 神经机器翻译译文评测及译后编辑研究[J]. 北京第二外国语学院学报, 2021, 43(5): 66-82.

Figures/Tables 5

References 38

[1]	Chen Boxing, & Cherry C. A systematic comparison of smoothing techniques for sentence-level BLEU[C]//Bojar O,Buck C,Federmann C,et al. Proceedings of the Ninth Workshop on Statistical Machine Translation. Baltimore: Association for Computational Linguistic, 2014:362-367.
[2]	Comelles E, Arranz V, & Castellón I. Guiding automatic MT evaluation by means of linguistic features[J]. Digital Scholarship in the Humanities, 2017, 32(4):761-778.
[3]	Doddington G. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics[C]//Marcus M. Proceedings of the Second International Conference on Human Language Technology Research. San Diego: Morgan Kaufmann Publishers Inc., 2002:138-145.
[4]	Farrús C M, Costa-jussà M R, Mariño J B, et al. Linguistic-based Evaluation Criteria to Identify Statistical Machine Translation Errors[R]. Saint-Raphaël:The 14th Annual Conference of the European Association for Machine Translation, 2010.
[5]	Frederking R, Rudnicky A, Hogan C, et al. Interactive speech translation in the Diplomat Project[J]. Machine Translation, 2000, 15(1):27-42. DOI URL
[6]	Holland M, Schlesiger C, & Tate C. Evaluating embedded machine translation in military field exercises[C]//White J S. Envisioning Machine Translation in the Information Future:4th Conference of the Association for Machine Translation in the Americas, AMTA 2000 Cuernavaca,Mexico,October 10-14,2000 Proceedings. Berlin: Springer, 2000:239-247.
[7]	Hutchins J W. Machine translation:A concise history[J]. Mechanical Translation, 2007, 13(1-2):1-21.
[8]	Jiang Cheng, & Wang Min. A comparative study of term extraction methods in translation[C]//Chan Sin-Wai. The Human Factor in Machine Translation. Oxon: Routledge, 2018:64-82.
[9]	Jones D, Shen W, & Herzog M. Machine translation for government applications[J]. Lincoln Laboratory Journal, 2009, 18(1):41-53.
[10]	Kirchhoff K, Capurro D, & Turner A. Evaluating User Preferences in Machine Translation Using Conjoint Analysis[R]. Trento:The 16th Annual Conference of the European Association for Machine Translation, 2012.
[11]	Palmer M, Rambow O, & Nasr A.. Rapid prototyping of domain-specific Machine Translation systems[C]//Farwell D,Gerber L & Hovy E. Machine Translation and the Information Soup:Third Conference of the Association for Machine Translation in the Americas AMTA ’98 Langhorne,PA,USA,October 28-31, 1998 Proceedings. Berlin: Springer, 1998:95-102.
[12]	Stymne S, & Ahrenberg L. On the practice of error analysis for machine translation evaluation[C]//Calzolari N,Choukri K,Declerck T,et al. Proceedings of the 8th International Conference on Language Resources and Evaluation(LREC 2012). Istanbul: European Language Resources Association (ELRA), 2012:1785-1790.
[13]	Vilar D, Xu Jia, D’Haro L F, et al. Error Analysis of Statistical Machine Translation Output[R]. Genoa:The 5th International Conference on Language Resources and Evaluation(LREC 2006), 2006.
[14]	鲍广宇, 杨飞, 刘晓明. 军事文本标图系统的设计与原型实现[J]. 解放军理工大学学报(自然科学版), 2003, 4(3):30-34.
[15]	陈齐祖. 机器翻译结合译后编辑模式的科技类英译汉翻译实践报告[D]. 重庆: 重庆大学, 2014.
[16]	程露蓝. The Defense Industrial Base(第二章)汉译实践报告[D]. 长沙: 湖南大学, 2018.
[17]	褚闽闽. 英汉机器翻译的译前和译后编辑策略--以《从朝鲜半岛视角看国际法》英译汉项目为例[D]. 上海: 上海外国语大学, 2018.
[18]	崔启亮. 论机器翻译的译后编辑[J]. 中国翻译, 2014, 35(6):68-73.
[19]	冯全功, 崔启亮. 译后编辑研究:焦点透析与发展趋势[J]. 上海翻译, 2016(6):67-74, 89,94.
[20]	冯全功, 刘明. 译后编辑能力三维模型构建[J]. 外语界, 2018(3):55-61.
[21]	郭高攀, 王宗英. 机器翻译的译前与译后编辑在科技文本翻译中的探究[J]. 浙江外国语学院学报, 2017(3):76-83.
[22]	黄金柱, 樊信展, 李峰, 等. 基于军事平行语料库的人机结合翻译策略[J]. 洛阳师范学院学报, 2016, 35(8):56-61,67.
[23]	Jiao Tingting. 微软人工智能又一里程碑:微软中-英机器翻译水平可“与人类媲美”[EB/OL].(2018-03-15)[2020-05-25]. 微信公众号“微软研究院AI头条”.
[24]	孔令然, 崔启亮. 论信息技术对翻译工作的影响[J]. 北京第二外国语学院学报, 2018, 40(3):44-57.
[25]	李梅, 朱锡明. 英汉机译错误分类及数据统计分析[J]. 上海理工大学学报(社会科学版), 2013, 35(3):201-207.
[26]	刘明, 彭天笑. 军事安全视角下的军队翻译能力建设[J]. 国防科技, 2018a, 39(3):32-36.
[27]	刘明, 彭天笑. 军事翻译语言资源平台建设构想[J]. 云梦学刊, 2018b, 39(2):12-17.
[28]	刘艳丽. “机器翻译+译后编辑”在不同文本类型中的适用性分析--以技术类文本和历史题材类文本汉译英翻译项目为例[D]. 上海: 上海外国语大学, 2020.
[29]	罗季美. 机器翻译句法错误分析[J]. 同济大学学报(社会科学版), 2014, 25(1):111-118,124.
[30]	罗季美, 李梅. 机器翻译译文错误分析[J]. 中国翻译, 2012, 33(5):84-89.
[31]	司显柱, 郭小洁. 中国翻译服务业研究现状分析[J]. 北京第二外国语学院学报, 2018, 40(3):17-30.
[32]	王萍. 机器翻译下预编辑和译后编辑在文史翻译中的作用[D]. 济南: 山东师范大学, 2016.
[33]	魏长宏, 张春柏. 机器翻译的译后编辑[J]. 中国科技翻译, 2007, 20(3):22-24,9.
[34]	解国栋, 易瑔, 朱斌. 现代军事情报处理方法中的语言、语音技术[J]. 装甲兵工程学院学报, 2006, 20(3):19-23.
[35]	许杰. 国防科技文本译后编辑实践报告--以The Defense Industrial Base(节选)汉译为例[D]. 长沙: 湖南大学, 2018.
[36]	张卉媛, 杨士超. 谷歌和百度机器翻译系统对军事英语文本中句子翻译之对比研究[J]. 科教文汇(上旬刊), 2019(12):184-185.
[37]	张瑞雪. 机器翻译+译后编辑在英汉翻译中的使用--以说明类文本和新媒体文本的翻译为例[D]. 上海: 上海外国语大学, 2018.
[38]	周博. 译后编辑与人工翻译过程中认知努力的对比实证研究[D]. 广州: 广东外语外贸大学, 2017.

	原文（英语）		译文（汉语）
	句子数	词数	句子数	字数
军事文本	1 000	26 799	1 000	47 643
通用文本	1 000	25 376	1 000	44 172

	原文（英语）		译文（汉语）
	句子数	词数	句子数	字数
军事文本	1 000	26 799	1 000	47 643
通用文本	1 000	25 376	1 000	44 172

	数量	最小值	最大值	平均值	标准差
谷歌	1 000	5.254 7	80.910 7	20.585 884	9.088 675 3
百度	1 000	5.829 7	90.247 1	22.025 858	10.27 4628 2
腾讯	1 000	4.067 1	89.396 2	21.496 574	9.691 479 9
有道	1 000	4.532 3	80.910 7	18.649 330	7.344 606 4
搜狗	1 000	5.411 4	92.053 0	21.504 140	9.881 029 8

	数量	最小值	最大值	平均值	标准差
谷歌	1 000	5.254 7	80.910 7	20.585 884	9.088 675 3
百度	1 000	5.829 7	90.247 1	22.025 858	10.27 4628 2
腾讯	1 000	4.067 1	89.396 2	21.496 574	9.691 479 9
有道	1 000	4.532 3	80.910 7	18.649 330	7.344 606 4
搜狗	1 000	5.411 4	92.053 0	21.504 140	9.881 029 8

	数量	最小值	最大值	平均值	标准差
谷歌	1 000	10.374 5	90.552 1	27.896 913	8.984 716 5
百度	1 000	10.275 2	92.378 6	27.521 496	9.234 731 6
腾讯	1 000	11.632 8	88.921 4	27.697 215	8.593 270 2
有道	1 000	10.021 1	89.770 5	26.465 237	8.953 749 6
搜狗	1 000	10.773 9	88.507 6	27.745 608	9.137 096 7