Ari Arifin Danuwijaya


Developing a test is a complex and reiterative process which subject to revision even if the items were developed by skilful item writers. Many commercial test publishers need to conduct test analysis, rather than trusting the item writers’ judgement and skills to improve the quality of items that need to be proven statistically after trying out was performed. This study is a part of test development process which aims to analyse the reading comprehension test items. One hundred multiple choice questions were pilot tested to 50 postgraduate students in one university. The pilot testing was aimed to investigate item quality which can further be developed better. The responses were then analysed using Classical Test Theory and using psychometric software called Lertap. The results showed that item difficulty level was mostly average. In terms of item discrimination, more than half of the total items were categorized marginal which required further modifications. This study suggests some recommendation that can be useful to improve the quality of the developed items.  

Keywords: reading comprehension; item analysis; classical test theory; item difficulty; test development.

Full Text:



Abedi, J. (2002). Standardized achievement tests and English language learners: Psychometrics issues. Educational Assessment, 8(3), 231-257.

Alderson, J. C. (2000). Assessing reading. Cambridge: Cambridge University Press.

Ananthakrishnan, N. (2000). Item analysis-validation and banking of MCQs. In N. Ananthkrishnan, K. R. Sethuraman, & S. Kumar, Medical education principles and practice. Pondichery: JIPMER, 131-137.

Boopathiraj, C., & Chellamani, K. (2013). Analysis of test items on difficulty level and discrimination index in the test for research in education. International Journal of Social Science & Interdisciplinary Research, 2(2), 189–193.

Brown, H. D. (2004). Language assessment: Principles and classroom practices. New York: Pearson Education.

Brown, J. D. (1995). The elements of language curriculum: A systematic approach to program development. Boston: Heinle and Heinle Publishers.

Brown, J. D. (2012). Classical test theory. In G. Fulcher & F. Davidson (Eds.), The Routledge handbook of language testing measurement. Accessed on: March 23, 2017 from

Carr, N. (2004). A review of Lertap (Laboratory of Educational Research Test Analysis Package) 5.2. International Journal of Testing, 4(2), 189–195.

Carr, N. (2011). Designing and analyzing language tests. Oxford: Oxford University Press.

Downing, S. M. (2010). Test development. In International Encyclopedia of Education (3rd ed., pp. 159–165). Elsevier.

Feng, Q., & Chen, L. (2016). A study on teaching methods of reading comprehension strategies by comparison between tem-4 reading comprehension and IELTS academic reading comprehension. Journal of Language Teaching and Research, 7(6), 1174–1180.

Grabe, W. (1997). Current developments in second language reading research. TESOL Quarterly, 25(3), 375–460.

Grabe, W. (2009). Reading in a second language: Moving from theory to practice. Cambridge: Cambridge University Press.

Grabe, W., & Stoller, F. L. (2002). Teaching and researching reading. Harlow: Longman.

Hingorjo, M. R., & Jaleel, F. (2012). Analysis of one-best MCQs: The difficulty index, discrimination index, and distractor efficiency. The Journal of the Pakistan Medical Association, 62(2), 142–147.

Kehoe, J. (1995). Basic item analysis for multiple-choice tests. Practical Assessment, Research, and Evaluation, 4(10).

Kendeou, P., Muis, K. R., & Fulton, S. (2011). Reader and text factors in reading comprehension processes. Journal of Research in Reading, 34(4), 365–383.

Kim, J., Chi, Y., Huensch, A., Jun, H., Li, H., & Roullin, V. (2010). A case study on an item writing process: Use of test specifications, nature of group dynamics, and individual item writers’ characteristics. Language Assessment Quarterly, 7(2), 160–174. doi: 10.1080/15434300903473989.

Kunnan, A. J., & Carr, N. T. (2013). Statistical analysis of test results. In C. A. Chapelle (Ed.), The encyclopedia of applied linguistics (pp. 1–7). Oxford: Blackwell Publishing.

Lowes, R., Peters, H., & Turner, M. (2004). The international student’s guide: Studying in English at university. Thousand Oaks, CA: SAGE Publications.

Ma, J., & Cheng, L. (2015). Chinese students’ perceptions of the value of test preparation courses for the TOEFL iBT. TESL Canada Journal, 33(1), 58–79.

Manarin, K., Carey, M., Rathburn, M., & Ryland, G. (2015). Critical reading in higher education: Academic goals and social engagement. Bloomington, Indiana: Indiana University Press.

Mehta, G., & Mokhasi, V. (2014). Item analysis of multiple choice questions - An assessment of the assessment tool. International Journal of Health Sciences and Research, 4(7), 197–202.

Mustafa, F., & Apriadi, H. (2016). DIY: Designing a reading test as reliable as a paper-based TOEFL designed by ETS. In Proceedings of the 1st English Education International Conference (pp. 402–407). Banda Aceh.

Nelson, L. R. (2001). Item analysis for test and surveys using Lertap 5. Perth: Curtin University of Technology.

Patil, R., Palve, S. B., Vell, K., & Boratne, A. V. (2016). Evaluation of multiple choice questions by item analysis in a medical college at Pondicherry, India. International Journal of Community Medicine and Public Health, 3(6), 1612–1616.

Phakiti, A., & Roever, C. (2011). Current issues and trends in language assessment in Australia and New Zealand. Language Assessment Quarterly, 8(2), 103–107.

Phantharakphonga, P., & Pothithab, S. (2014). Development of English reading comprehension by using concept maps. Procedia - Social and Behavioral Sciences, 116, 497–501.

Qaqish, B. (2006). Developing multiple choice tests for social work trainings. In B. Johnson, M. Henderson, & M. Thibedeau (Eds.), Eighth Annual National Human Services Training Evaluation Symposium (pp. 91–111). Berkeley, California: California Social Work Education Center, University of California.

Quaigrain, K., & Arhin, A. K. (2017). Using reliability and item analysis to evaluate a teacher-developed test in educational measurement and evaluation. Cogent Education, 12, 1–11. doi: 10.1080/2331186X.2017.1301013.

Shirvan, M. E. (2016). Assessing and improving general English university students’ main sub-skills of reading compression: A case of University of Bojnord. Sino-US English Teaching, 13(4), 245–260. doi: 10.17265/1539-8072/2016.04.002.

Spaan, M. (2006). Test and item specifications development. Language Assessment Quarterly, 3(1), 71–79. doi: 10.1207/s15434311laq0301.

Spaan, M. (2007). Evolution of a test item. Language Assessment Quarterly, 4(3), 279–293. doi: 10.1080/15434300701462937.

Tarrant, M., Ware, J., & Mohammed, A. M. (2009). An assessment of functioning and non-functioning distractors in multiple-choice questions: A descriptive analysis. BMC Medical Education, 9(1), 40.

Thorndike, R. M., & Thorndike-Christ, T. (2010). Measurement and evaluation in psychology and education (8th ed.). Boston: Pearson Education.

Turner, K., Ireland, L., Krenus, B., & Pointon, L. (2011). Essential academic skills. Melbourne: Oxford University Press.

Wells, C. S., & Wollack, J. A. (2003). An instructor’s guide to understanding test reliability. Madison: University of Wisconsin.



  • There are currently no refbacks.

Copyright (c) 2018 English Review: Journal of English Education