한국어 말하기 평가에서 원어민과 비원어민 채점자의 채점 경향 비교

김지영 1 , *
Jee Young Kim 1 , *
Author Information & Copylight
1이화여자대학교 언어교육원
1Ewha Language Center
*Corresponding Author : 이화여대 언어교육원 한국어교육부, 한국어강사, 03760 서울시 서대문구 이화여대길 52, E-mail:

ⓒ Copyright 2019 Language Education Institute, Seoul National University. This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Received: Mar 01, 2019 ; Revised: Mar 25, 2019 ; Accepted: Apr 02, 2019

Published Online: Apr 30, 2019


The purpose of this study was to compare the rating tendencies of Korean and Chinese raters in a Korean Speaking Test. For this purpose, graduate students majoring in Korean education were trained, and an individual rating process was performed. The results of the rating were then analyzed using the multi-faceted Rasch model, focusing on rating consistency, severity, and bias. The results of the analysis indicate that rating severity differed among raters even in the same group, and rating consistency was either an overfit or a misfit in the Chinese rating (CR) group. The results also showed that the CR group tended to be more tolerant of assessment evaluation items with less difficulty than the Korean rating (KR) group, and to score assessment evaluation items with higher difficulty more strictly. In the analysis of criteria, the KR group scored more strictly than the CR group on the three criteria except vocabulary and grammar, and organization. In contrast, there was a statistically significant tendency toward the opposite for “organization”. The two groups differed in judging the proficiency of test takers, with differences between the test takers in cases where proficiency is relatively high and when specific languages are reflected in the test taker's pronunciation or intonation, some of which were analyzed as significant bias. Finally, the use of the evaluation scale showed that the CR group had a wider distribution, but the reliability of using the zero scale was low, and the difference between the mean scores of each scale was not uniform. This group also tended toward the middle point rather than toward the peak.

Keywords: Korean speaking test; Korean rater; Chinese rater; rating tendency



강석한, 안현기. (2014). "외국인 한국어 말하기 시험의 평가자 요소가 채점에 미치는 영향", 『이중언어학』 55, 1-29 .


김가람. (2016). "중국인 한국어교원에 대한 이론적 고찰", 『한국어교육』 27(1), 1-20 .


김지영. (2018). 『한국어 말하기 평가 채점자의 채점 경향 연구』, 박사학위논문, 연세대학교, 서울 .


김향란. (2019). 『중국 내 대학 한국어학 전공자를 위한 기본 교재 개발 연구』, 박사학위논문, 상명대학교, 서울 .


박동호, 김유미, 김현정, 신동일, 우창현, 이영식, 조수진, 지현숙. (2012). 『한국어능력시험의 CBT/IBT 기반 말하기 평가를 위한 문항 유형 개발』, 국립국제교육원 .


박종임. (2013). 『국어교사의 쓰기 평가 특성 연구』. 박사학위 논문, 한국교원대학교, 충북 .


백현영, 양병곤. (2011). "중학교 영어교사의 말하기평가 채점경향 분석", 『언어과학』 18(4), 77-99 .


신동일. (2001). "채점 경향 분석을 위한 Rasch 측정모형 적용 연구", 『Foreign Language Education』 8(1), 249-272 .


신동일, 설현수. (2005). "NEW FACETS을 활용한 채점자료 분석방법", 『Foreign Languages Education』 12(2), 191-211 .


원미진, 강현화, 김미옥, 김성숙, 김현정. (2017). 『제4차 한국어능력시험 말하기 평가 개발 연구』, 국립국제교육원 .


원미진, 김지영. (2017). "한국어 말하기 평가 개발을 위한 채점 경향 분석 연구", 『외국어로서의 한국어교육』, 47, 169-192 .


이영식. (2014). "다국면 Rasch 측정의 Facets 프로그램을 활용한 영어 작문 평가의 원어민 채점 검증", 『영어어문교육』, 20(1), 475-496 .


이향. (2013). "한국어 말하기 평가의 발음 영역 채점에서의 채점자 특성에 따른 채점 경향 연구: 한국어 교육 경험과 전공을 중심으로", 『외국어로서의 한국어교육』 39, 213-245 .


장소영, 신동일. (2009). 『언어교육평가 연구를 위한 FACETS 프로그램: 기초 과정편』, 서울: 글로벌콘텐츠 .


Carey, M. D., Mannell, R. H. and Dunn, P. K. (2011). Does a rater's familiarity with a candidate's pronunciation affect the rating in oral proficiency interview?. Language Testing 28(2), 201-219 .


John M. Linacre. (2014). A User's Guide to FACETS Rasch-Model Computer program. Winsteps .


Kang, Seokhan and Hyunkee Ahn. (2012). A comparative study on criteria and tasks in Korean English Speaking Assessment by Native and Non-native raters. Language Research 48(2), 241-262 .


Kim, Y. -H. (2009). An investigation into native and non-native teachers' judgments of oral English performance: A mixed methods approach. Language Testing 26(2), 187-217 .


Kim, Hyunah. (2016). Comparing native and non-native rater assessments of Korean Oral Proficiency: A FACETS analysis. Korean Language Education Research 51(5), 84-113 .


Kim, Hyun Jung. (2011). Investigating rater behavior across diverse English speaking tasks. Foreign Language Education 18(2), 99-125 .


Lee, Seongyong and Chae, Hohsung. (2012). Rating of Korean students' L2 writing: similarities and differences between native and non-native raters. The Journal of Curriculum and Instruction Studies 16(3), 629-655 .


McNamara, T. F. (1996). Measuring Second Language Performance. Pearson Education Ltd.(채선희 외 옮김(2003), 『문항반응이론의 이론과 실제: 외국어 수행평가를 중심으로』, 경기:서현사.) .


Gui, Min. (2012). Exploring differences between Chinese and American E.F.L teachers' evaluation of speech performance. Language Assessment Quarterly 9(2), 186-203 .


Winke, P., S. Gass and Myford, C. (2013). Raters' L2 background as a potential source of bias in rating oral performance. Language Testing 30(2), 231-252 .


Yu, Kyung-Ah. (2010). The effect of raters' language background on English-speaking test ratings across test-takers' oral proficiency levels. Applied Linguistics 26(4), 395-419 .


Shi, Ling. (2001). Native-and nonnative-speaking EFL teachers' evaluation of Chinese students' English writing. Language Testing 18(3), 303-325 .