Many scholars have worked on evaluating achievement tests in English language but most of them are still not satisfactory leaving a gap of the study using CTT and IRT. This study is to evaluate English language achievement test using item response theory (IRT) and classical test theory (CTT) among senior secondary school students in Ibadan metropolis. Descriptive survey research design was used. Sample for this study was drawn using stratified random sampling technique to select one local government in Ibadan and simple random sampling to select two schools (one public and one private). Questions were selected from, section 4&5 WAEC June English language and section 1&2 making 50 questions all together were used for the study. It is a standardized test. The result of the research shows that English language did not satisfy for the criteria item undimensionality. Also, the result showed that based on stated criteria (0.20 < p< 0.80 and rpbs> 0.15) for classical item difficulty index, 16 items were considered poor while on classical discriminating index criteria 13 items were considered poor. Considering the classical difficulty and discrimination index 29 items were considered poor apart from the three items initially considered excluded due to item redundancy. Therefore only 15 items survived the CTT framework criteria out of 50 items of English language test. The result also showed that from the test information function, items whose difficulty level fall between -0.9 and 3.0 were good and selected while items whose difficulty parameter fall outside of the range were poor and were removed from the items and they were 5 items. The result showed that there is variance in the number of items surviving from each of the frameworks. From the CTT framework 16 items were found to have extreme difficulty index and 15 items having high discriminating indexes and, while the IRT framework only had 5 items showing extreme tanged values in the TIF curve The result shows that 1 item was common to both CTT framework and IRT framework in terms of items excluded. Also, the result of the study shows that IRT item parameters are generally more invariant than CTT parameters in the case of good IRT model fit, whereas CTT item parameters are more invariant in the case of poor IRT model fit. Based on the findings, test developers and public examining bodies should integrate IRT models into their test development processes However, because in the overall, item statistics from both IRT and CTT frameworks are comparable in some cases, it is recommended that CTT framework could be used as a complement to IRT.
ACCESS ARTICLE