TEPS 하위영역 점수 및 총점에 대한 신뢰도 분석

임의진 1 ,
Euijin Lim 1 ,
Author Information & Copyright
1Seoul National University
Corresponding Author :

ⓒ Copyright 2018 Language Education Institute, Seoul National University. This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Published Online: Nov 01, 2019


The purpose of the current study is to investigate the reliability and stability of the section and total (composite) scores of TEPS from the classical test theory perspectives. For reliability analyses, multiple sets of data were collected not only from the operational TEPS administrations before and after the revision but also from the four pilot tests administered during the TEPS revision process. Cronbach’s (1951) alpha coefficients were computed for four different section scores of the tests while Feldt and Brennan’s (1989) composite score reliability coefficients were computed for the total scores of these tests. These coefficients are examined and compared across different test forms of the original and revised TEPS. Coefficients of equivalence and stability and correlation coefficients between forms before and after the revision were also examined to see how stable TEPS scores were. The results showed that TEPS section scores and the total score were reliable and that the changes introduced by the revision did not reduce the stability. The total score had high reliability above 0.9 indicating that TEPS can be used as a dependable indicator of Korean English language learners’ English proficiency to inform language-related, decision-making.

Keywords: TEPS; reliability; internal-consistency reliability; coefficient of equivalence and stability



Attali, Y., Lewis, W., & Steier, M. (2013). Scoring with the computer: Alternative procedures for improving the reliability of holistic essay scoring. Language Testing, 30, 125-141 .


Bolus, R. E., Hinofotis, F. B., & Bailey, K. M. (1982). An introduction to generalizability theory in second language research. Language Learning, 32, 245-258 .


Brown, W. (1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology, 3, 296-322 .


Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Orlando, FL: Holt .


Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297-334 .


Educational Testing Service. (2011). TOEFL® Research Insight Series, Volume 3: Reliability and comparability of TOEFL iBT scores. Retrieved from .


Educational Testing Service. (2019). User guide for the TOEIC® listening and reading test. Retrieved from .


Fazel, I., & Ahmadi, A. (2011). On the relationship between writing proficiency and instrumental/integrative motivation among Iranian IELTS candidates. Theory and Practice in Language Studies, 1, 747-757 .


Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 105-146). New York: MacMillan .


Ferguson, G. A. & Takane, Y. (1989). Statistical analysis in psychology and education (6th ed.). New York, NY: McGraw-Hill .


Gessaroli, M. E., & Folske, J. C. (2002). Generalizing the reliability of tests comprised of testlets. International Journal of Testing, 2, 277-295 .


Kim, J. (2016). Reliability and test length (internal document). Seoul: TEPS Center .


Knoch, U. (2009). Diagnostic assessment of writing: A comparison of two rating scales. Language Testing, 26, 275-304 .


Krzanowski, W. J., & Woods, A. J. (1984). Statistical aspects of reliability in language testing. Language Testing, 1, 1-20 .


Kwon, H., Lee, Y.-W., Lee, Y., Park, Y.-J., Kim, J., Jun, H., . . . Park, H. (2018). Development and validation of a pilot test form for the revised TEPS (Research Report No. 80). Seoul: SNU Language Education Institute .


Longabach, T., & Peyton, V. (2018). A comparison of reliability and precision of subscore reporting methods for a state English language proficiency assessment. Language Testing, 35, 297-317 .


Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3, 271-295 .


TEPS Center (2019). TEPS technical report: 2018 administration (internal document). Seoul:TEPS Center .


UCLES. (2007). IELTS handbook 2007. Retrieved from .


UCLES. (n.d.). Quality and accountability. Retrieved from .


Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing, 15, 263-287 .