Analysis of biology midterm exam items using a comparison of the classical theory test and the Rasch model
DOI:
https://doi.org/10.22219/jpbi.v10i3.34345Keywords:
difficulty level, distractor effectiveness, item discrimination, reliability, validityAbstract
In biology learning, test instruments are essential for assessing students' understanding of complex concepts. A test instrument is a crucial factor in learning evaluation; however, its implementation remains minimal. This descriptive quantitative study aims to analyze the quality of test items using the classical approach in terms of validity, reliability, difficulty index, discrimination power, distractor effectiveness, and the Rasch model analysis. The data consists of 30 multiple-choice questions from a biology midterm exam administered to 40 students. Classical test data analysis uses Microsoft Excel, while Rasch model analysis uses Winsteps software. The validity test results from both approaches show 14 valid questions and 16 invalid ones. The reliability scores are 0.619 (adequate) for the classical approach's Cronbach's Alpha, 0.85 (good) for the Rasch model, and 0.65 (weak) for personal reliability. The classical test theory and the Rasch model categorize item difficulty into four levels. The classical approach produces five categories for item discrimination, while the Rasch model identifies three groups based on the item separation index (H=3.45) and two groups based on respondent ability (H=1.96). Distractor effectiveness shows 93.3% functional distractors in the classical test and 80% in the Rasch model. The Rasch model offers greater precision in measuring student ability and detecting bias. Both models should be integrated for comprehensive item analysis. Future tests should focus on improving invalid items and the quality of distractors.
Downloads
References
Aaij, R., Abdelmotteleb, A. S. W., Beteta, C. A., Gallego, F. J. A., Ackernley, T., Adeva, B., Adinolfi, M., Afsharnia, H., Agapopoulou, C., Aidala, C. A., Aiola, S., Ajaltouni, Z., Akar, S., Albrecht, J., Alessio, F., Alexander, M., Albero, A. A., Aliouche, Z., Alkhazov, G., … Zunica, G. (2022). Study of the doubly charmed tetraquark Tcc+. Nature Communications, 13(1). https://doi.org/10.1038/s41467-022-30206-w
Andrich, D., Marais, I., & Humphry, S. M. (2016). Controlling Guessing Bias in the Dichotomous Rasch Model Applied to a Large-Scale, Vertically Scaled Testing Program. Educational and Psychological Measurement, 76(3), 412–435. https://doi.org/10.1177/0013164415594202
Angell, D. K., Lane-Getaz, S., Okonek, T., & Smith, S. (2024). Metacognitive Exam Preparation Assignments in an Introductory Biology Course Improve Exam Scores for Lower ACT Students Compared with Assignments that Focus on Terms. CBE Life Sciences Education, 23(1). https://doi.org/10.1187/cbe.22-10-0212
Aryadoust, V., Ng, L. Y., & Sayama, H. (2021). A comprehensive review of Rasch measurement in language assessment: Recommendations and guidelines for research. Language Testing, 38(1), 6–40. https://doi.org/10.1177/0265532220927487
Ayres, P., Lee, J. Y., Paas, F., & van Merriënboer, J. J. G. (2021). The Validity of Physiological Measures to Identify Differences in Intrinsic Cognitive Load. In Frontiers in Psychology (Vol. 12). Frontiers Media S.A. https://doi.org/10.3389/fpsyg.2021.702538
Babu, N., & Kohli, P. (2023). Commentary: Reliability in research. In Indian Journal of Ophthalmology (Vol. 71, Issue 2, pp. 400–401). Wolters Kluwer Medknow Publications. https://doi.org/10.4103/ijo.IJO_2016_22
Baghaei, P., Yanagida, T., & Heene, M. (2017). Development of a Descriptive Fit Statistic for the Rasch Model. In North American Journal of Psychology (Vol. 19, Issue 1).
Baldan, D., Negash, M., & Ouyang, J. Q. (2021). Are individuals consistent? Endocrine reaction norms under different ecological challenges. Journal of Experimental Biology, 224(12). https://doi.org/10.1242/jeb.240499
Barbic, D., Kim, B., Salehmohamed, Q., Kemplin, K., Carpenter, C. R., & Barbic, S. P. (2018). Diagnostic accuracy of the Ottawa 3DY and Short Blessed Test to detect cognitive dysfunction in geriatric patients presenting to the emergency department. BMJ Open, 8(3). https://doi.org/10.1136/bmjopen-2017-019652
Batista, S. A., Stedefeldt, E., Nakano, E. Y., De Oliveira Cortes, M., Assunção Botelho, R. B., Zandonadi, R. P., Raposo, A., Han, H., & Ginani, V. C. (2021). Design and development of an instrument on knowledge of food safety, practices, and risk perception addressed to children and adolescents from low-income families. Sustainability (Switzerland), 13(4), 1–20. https://doi.org/10.3390/su13042324
Bejerholm, U., & Lundgren-Nilsson, Å. (2015). Rasch Analysis of the Profiles of Occupational Engagement in people with Severe mental illness (POES) instrument. Health and Quality of Life Outcomes, 13(1). https://doi.org/10.1186/s12955-015-0327-0
Blacquiere, L. D., & Hoese, W. J. (2016). A valid assessment of students’ skill in determining relationships on evolutionary trees. Evolution: Education and Outreach, 9(1). https://doi.org/10.1186/s12052-016-0056-9
Blanco, I., Boemo, T., Martin-Garcia, O., Koster, E. H. W., De Raedt, R., & Sanchez-Lopez, A. (2023a). Online Contingent Attention Training (OCAT): transfer effects to cognitive biases, rumination, and anxiety symptoms from two proof-of-principle studies. Cognitive Research: Principles and Implications, 8(1). https://doi.org/10.1186/s41235-023-00480-3
Bradley, C., & Massof, R. W. (2017). Validating Translations of Rating Scale Questionnaires Using Rasch Analysis. In Ophthalmic Epidemiology (Vol. 24, Issue 1, pp. 1–2). Taylor and Francis Ltd. https://doi.org/10.1080/09286586.2016.1246667
Bramley, T. (2015). Rasch Measurement in the Social Sciences and Quality of Life Research. Europe’s Journal of Psychology, 11(1), 169–171. https://doi.org/10.5964/ejop.v11i1.913
Cecilio-Fernandes, D., Medema, H., Collares, C. F., Schuwirth, L., Cohen-Schotanus, J., & Tio, R. A. (2017). Comparison of formula and number-right scoring in undergraduate medical training: A Rasch model analysis. BMC Medical Education, 17(1). https://doi.org/10.1186/s12909-017-1051-8
Cheema, J. R. (2019). Cross-country gender DIF in PISA science literacy items. European Journal of Developmental Psychology, 16(2), 152–166. https://doi.org/10.1080/17405629.2017.1358607
Chen, P.-Y., Wu, W., Garnier-Villarreal, M., Kite, B. A., & Jia, F. (2019). Testing Measurement Invariance with Ordinal Missing Data: A Testing Measurement Invariance with Ordinal Missing Data: A Comparison of Estimators and Missing Data Techniques Comparison of Estimators and Missing Data Techniques. https://epublications.marquette.edu/nursing_fac
Cliff, W. H. (2023). Teaching with core concepts to facilitate the integrated learning of introductory organismal biology. Advances in Physiology Education, 47(3), 562–572. https://doi.org/10.1152/ADVAN.00134.2022
Darmana, A., Sutiani, A., Nasution, H. A., Ismanisa*, I., & Nurhaswinda, N. (2021). Analysis of Rasch Model for the Validation of Chemistry National Exam Instruments. Jurnal Pendidikan Sains Indonesia, 9(3), 329–345. https://doi.org/10.24815/jpsi.v9i3.19618
de Jong, L. H., Bok, H. G. J., Schellekens, L. H., Kremer, W. D. J., Jonker, F. H., & van der Vleuten, C. P. M. (2022). Shaping the right conditions in programmatic assessment: how quality of narrative information affects the quality of high-stakes decision-making. BMC Medical Education, 22(1). https://doi.org/10.1186/s12909-022-03257-2
De Sá, A. R., Liebel, G., De Andrade, A. G., Andrade, L. H., Gorenstein, C., & Wang, Y. P. (2019a). Can gender and age impact on response pattern of depressive symptoms among college students? A differential item functioning analysis. Frontiers in Psychiatry, 10(FEB). https://doi.org/10.3389/fpsyt.2019.00050
Echevarría-Guanilo, M. E., Gonçalves, N., & Juceli Romanoski, P. (2019). Psychometric properties of measurement instruments: Conceptual basis and evaluation methods- Part II. Texto e Contexto Enfermagem, 28. https://doi.org/10.1590/1980-265X-TCE-2017-0311
Eden, M. M. (2018). Shoulder-Specific Patient Reported Outcome Measures for Use in Patients with Head and Neck Cancer:An Assessment of Reliability, Construct Validity, and Overall Appropriateness of Test Score Interpretation Using Rasch Analysis. Doctoral dissertation. Nova Southeastern University. Retrieved from NSUWorks, College of Health Care Sciences-Physical Therapy Department (Issue 62). https://nsuworks.nova.edu/hpd_pt_stuetd
Elliott, M. L., Knodt, A. R., Ireland, D., Morris, M. L., Poulton, R., Ramrakha, S., Sison, M. L., Moffitt, T. E., Caspi, A., & Hariri, A. R. (2020). What Is the Test-Retest Reliability of Common Task-Functional MRI Measures? New Empirical Evidence and a Meta-Analysis. Psychological Science, 31(7), 792–806. https://doi.org/10.1177/0956797620916786
Finch, H., & Edwards, J. M. (2016). Rasch Model Parameter Estimation in the Presence of a Nonnormal Latent Trait Using a Nonparametric Bayesian Approach. Educational and Psychological Measurement, 76(4), 662–684. https://doi.org/10.1177/0013164415608418
Fischer, H. F., & Rose, M. (2016). Www.common-metrics.org: A web application to estimate scores from different patient-reported outcome measures on a common scale. BMC Medical Research Methodology, 16(1). https://doi.org/10.1186/s12874-016-0241-0
Fujimoto, Y., Chevance, M., Haydon, D. T., Krumholz, M. R., & Kruijssen, J. M. D. (2019). A fundamental test for stellar feedback recipes in galaxy simulations. Monthly Notices of the Royal Astronomical Society, 487(2), 1717–1728. https://doi.org/10.1093/mnras/stz641
Gaitán-Rossi, P., Vilar-Compte, M., Teruel, G., & Pérez-Escamilla, R. (2021). Food insecurity measurement and prevalence estimates during the COVID-19 pandemic in a repeated cross-sectional survey in Mexico. Public Health Nutrition, 24(3), 412–421. https://doi.org/10.1017/S1368980020004000
Gao, Y., Bing, L., Li, P., King, I., & Lyu, M. R. (2019). Generating Distractors for Reading Comprehension Questions from Real Examinations. www.aaai.org
Garrido, C. C., González, D. N., Seva, U. L., & Piera, P. J. F. (2019). Multidimensional or essentially unidimensional? A multi-faceted factoranalytic approach for assessing the dimensionality of tests and items. Psicothema, 31(4), 450–457. https://doi.org/10.7334/psicothema2019.153
Gay, C. L., Kottorp, A., Lerdal, A., & Lee, K. A. (2016). Psychometric limitations of the center for epidemiologic studies-depression scale for assessing depressive symptoms among adults with HIV/AIDS: A rasch analysis. Depression Research and Treatment, 2016. https://doi.org/10.1155/2016/2824595
Goel, A., & Gross, A. (2019). Differential item functioning in the cognitive screener used in the Longitudinal Aging Study in India. International Psychogeriatrics, 31(9), 1331–1341. https://doi.org/10.1017/S1041610218001746
Goh, H. E., Marais, I., & Ireland, M. J. (2017). A Rasch Model Analysis of the Mindful Attention Awareness Scale. Assessment, 24(3), 387–398. https://doi.org/10.1177/1073191115607043
Gray, N., Calleja, D., Wimbush, A., Miralles-Dolz, E., Gray, A., De Angelis, M., Derrer-Merk, E., Oparaji, B. U., Stepanov, V., Clearkin, L., & Ferson, S. (2020). Is “no test is better than a bad test”? Impact of diagnostic uncertainty in mass testing on the spread of COVID-19. PLoS ONE, 15(10 October). https://doi.org/10.1371/journal.pone.0240775
Hagquist, C. (2019). Explaining differential item functioning focusing on the crucial role of external information - an example from the measurement of adolescent mental health. BMC Medical Research Methodology, 19(1), 185. https://doi.org/10.1186/s12874-019-0828-3
Hope, D., Adamson, K., McManus, I. C., Chis, L., & Elder, A. (2018). Using differential item functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment. BMC Medical Education, 18(1). https://doi.org/10.1186/s12909-018-1143-0
Ibrahim, F. M., Shariff, A. A., & Tahir, R. M. (2015). Using Rasch model to analyze the ability of pre-university students in vector. AIP Conference Proceedings, 1682. https://doi.org/10.1063/1.4932472
Jacob, E. R., Duffield, C., & Jacob, A. M. (2019). Validation of data using RASCH analysis in a tool measuring changes in critical thinking in nursing students. Nurse Education Today, 76, 196–199. https://doi.org/10.1016/j.nedt.2019.02.012
Jacobs, N. W., Berduszek, R. J., Dijkstra, P. U., & van der Sluis, C. K. (2017). Validity and Reliability of the Upper Extremity Work Demands Scale. Journal of Occupational Rehabilitation, 27(4), 520–529. https://doi.org/10.1007/s10926-016-9683-9
Jimam, N. S., Ahmad, S., & Ismail, N. E. (2019). Psychometric Classical Theory Test and Item Response Theory Validation of Patients’ Knowledge, Attitudes and Practices of Uncomplicated Malaria Instrument. Journal of Young Pharmacists, 11(2), 186–191. https://doi.org/10.5530/jyp.2019.11.39
Jin, I. H., & Jeon, M. (2019). A Doubly Latent Space Joint Model for Local Item and Person Dependence in the Analysis of Item Response Data. Psychometrika, 84(1), 236–260. https://doi.org/10.1007/s11336-018-9630-0
Jones, I., Bisson, M., Gilmore, C., & Inglis, M. (2019). Measuring conceptual understanding in randomised controlled trials: Can comparative judgement help? British Educational Research Journal, 45(3), 662–680. https://doi.org/10.1002/berj.3519
Kassim, M. A. M., Pang, N. T. P., Kamu, A., Arslan, G., Mohamed, N. H., Zainudin, S. P., Ayu, F., & Ho, C. M. (2023). Psychometric Properties of the Coronavirus Stress Measure with Malaysian Young Adults: Association with Psychological Inflexibility and Psychological Distress. International Journal of Mental Health and Addiction, 21(2), 819–835. https://doi.org/10.1007/s11469-021-00622-y
Köhler, C., & Hartig, J. (2017). Practical Significance of Item Misfit in Educational Assessments. Applied Psychological Measurement, 41(5), 388–400. https://doi.org/10.1177/0146621617692978
Kok, K., & Priemer, B. (2023). Assessment tool to understand how students justify their decisions in data comparison problems. Physical Review Physics Education Research, 19(2). https://doi.org/10.1103/PhysRevPhysEducRes.19.020141
Korbee, S., VAN KEMPEN, R., VAN WENSEN, R., VAN DER STEEN, M., & Liu, W. Y. (2022). Measurement properties of the HOOS-PS in revision total hip arthroplasty: a validation study on validity, interpretability, and responsiveness in 136 revision hip arthroplasty patients. Acta Orthopaedica, 93, 742–749. https://doi.org/10.2340/17453674.2022.4572
Królikowska, A., Reichert, P., Karlsson, J., Mouton, C., Becker, R., & Prill, R. (2023). Improving the reliability of measurements in orthopaedics and sports medicine. Knee Surgery, Sports Traumatology, Arthroscopy, 31(12), 5277–5285. https://doi.org/10.1007/s00167-023-07635-1
Kumar Mohajan, H. (2017). TWO CRITERIA FOR GOOD MEASUREMENTS IN RESEARCH: VALIDITY AND RELIABILITY.
Le, C., Guttersrud, Ø., Sørensen, K., & Finbråten, H. S. (2022). Developing the HLS19-YP12 for measuring health literacy in young people: a latent trait analysis using Rasch modelling and confirmatory factor analysis. BMC Health Services Research, 22(1). https://doi.org/10.1186/s12913-022-08831-4
Lestari, E. K., & Yudhanegara, R. M. (2015). Penelitian Pendidikan Matematika. Refika Aditama.
Lewis, A. F., Myers, M., Heiser, J., Kolar, M., Baird, J. F., & Stewart, J. C. (2020). Test–retest reliability and minimal detectable change of corticospinal tract integrity in chronic stroke. Human Brain Mapping, 41(9), 2514–2526. https://doi.org/10.1002/hbm.24961
Li, J. J., Reise, S. P., Chronis-Tuscano, A., Mikami, A. Y., & Lee, S. S. (2016). Item Response Theory Analysis of ADHD Symptoms in Children With and Without ADHD. Assessment, 23(6), 655–671. https://doi.org/10.1177/1073191115591595
Liu, R., & Jiang, Z. (2020). A general diagnostic classification model for rating scales. Behavior Research Methods, 52(1), 422–439. https://doi.org/10.3758/s13428-019-01239-9
Liu, Y., Yin, H., Xin, T., Shao, L., & Yuan, L. (2019). A comparison of differential item functioning detection methods in cognitive diagnostic models. Frontiers in Psychology, 10(MAY). https://doi.org/10.3389/fpsyg.2019.01137
Liuzza, M. T., Spagnuolo, R., Antonucci, G., Grembiale, R. D., Cosco, C., Iaquinta, F. S., Funari, V., Dastoli, S., Nistico, S., & Doldo, P. (2021). Psychometric evaluation of an Italian custom 4-item short form of the PROMIS anxiety item bank in immune-mediated inflammatory diseases: An item response theory analysis. PeerJ, 9. https://doi.org/10.7717/peerj.12100
Mazurek, M. O., Carlson, C., Baker-Ericzén, M., Butter, E., Norris, M., & Kanne, S. (2020). Construct Validity of the Autism Impact Measure (AIM). Journal of Autism and Developmental Disorders, 50(7), 2307–2319. https://doi.org/10.1007/s10803-018-3462-8
McKeigue, P. (2019). Quantifying performance of a diagnostic test as the expected information for discrimination: Relation to the C-statistic. Statistical Methods in Medical Research, 28(6), 1841–1851. https://doi.org/10.1177/0962280218776989
Milania, A. A., & Murniati, W. (2022). Teacher’s Pedagogic Competence In Evaluating Learning. KINDERGARTEN: Journal of Islamic Early Childhood Education, 5(2), 245. https://doi.org/10.24014/kjiece.v5i2.20013
Morgan-López, A. A., Saavedra, L. M., Hien, D. A., Killeen, T. K., Back, S. E., Ruglass, L. M., Fitzpatrick, S., López-Castro, T., & Patock-Peckham, J. A. (2020). Estimation of equable scale scores and treatment outcomes from patient-and clinician-reported PTSD measures using item response theory calibration. Psychological Assessment, 32(4), 321–335. https://doi.org/10.1037/pas0000789
Murphy, M., McCloughen, A., & Curtis, K. (2019). Using theories of behaviour change to transition multidisciplinary trauma team training from the training environment to clinical practice. Implementation Science, 14(1). https://doi.org/10.1186/s13012-019-0890-6
Musa, A., Shaheen, S., Elmardi, A., & Ahmed, A. (2021). Item difficulty & item discrimination as quality indicators of physiology MCQ examinations at the Faculty of Medicine Khartoum University. Khartoum Medical Journal, 11(2). https://doi.org/10.53332/kmj.v11i2.610
Nielsen, J. B., Kyvsgaard, J. N., Sildorf, S. M., Kreiner, S., & Svensson, J. (2017). Item analysis using Rasch models confirms that the Danish versions of the DISABKIDS® chronic-generic and diabetes-specific modules are valid and reliable. Health and Quality of Life Outcomes, 15(1). https://doi.org/10.1186/s12955-017-0618-8
O’Brien, K. K., Dzingina, M., Harding, R., Gao, W., Namisango, E., Avery, L., & Davis, A. M. (2021). Developing a short-form version of the HIV Disability Questionnaire (SF-HDQ) for use in clinical practice: a Rasch analysis. Health and Quality of Life Outcomes, 19(1). https://doi.org/10.1186/s12955-020-01643-2
Orozco, T., Segal, E., Hinkamp, C., Olaoye, O., Shell, P., & Shukla, A. M. (2022). Development and validation of an end stage kidney disease awareness survey: Item difficulty and discrimination indices. PLoS ONE, 17(9 September). https://doi.org/10.1371/journal.pone.0269488
Papenberg, M., & Musch, J. (2017). Of Small Beauties and Large Beasts: The Quality of Distractors on Multiple-Choice Tests Is More Important Than Their Quantity. Applied Measurement in Education, 30(4), 273–286. https://doi.org/10.1080/08957347.2017.1353987
Parra-Anguita, L., Sánchez-García, I., Del Pino-Casado, R., & Pancorbo-Hidalgo, P. L. (2019). Measuring knowledge of Alzheimer’s: Development and psychometric testing of the UJA Alzheimer’s Care Scale. BMC Geriatrics, 19(1). https://doi.org/10.1186/s12877-019-1086-2
Poorebrahim, A., Lin, C. Y., Imani, V., Kolvani, S. S., Alaviyoun, S. A., Ehsani, N., & Pakpour, A. H. (2021). Using Mindful Attention Awareness Scale on male prisoners: Confirmatory factor analysis and Rasch models. PLoS ONE, 16(7 July). https://doi.org/10.1371/journal.pone.0254333
Prenovost, K. M., Fihn, S. D., Maciejewski, M. L., Nelson, K., Vijan, S., & Rosland, A. M. (2018). Using item response theory with health system data to identify latent groups of patients with multiple health conditions. PLoS ONE, 13(11). https://doi.org/10.1371/journal.pone.0206915
Pretz, C. R., Kean, J., Heinemann, A. W., Kozlowski, A. J., Bode, R. K., & Gebhardt, E. (2016). A Multidimensional Rasch Analysis of the Functional Independence Measure Based on the National Institute on Disability, Independent Living, and Rehabilitation Research Traumatic Brain Injury Model Systems National Database. Journal of Neurotrauma, 33(14), 1358–1362. https://doi.org/10.1089/neu.2015.4138
Retnawati, H. (2014). Teori respons butir dan penerapannya: Untuk peneliti, praktisi pengukuran dan pengujian, mahasiswa pascasarjana. Nuha Medika.
Retnawati, H. (2016). Analisis Kuantitatif Instrumen Penelitian (Pertama). Parama Publishing. www.nuhamedika.gu.ma
Rezigalla, A. A., Eleragi, A. M. E. S. A., Elhussein, A. B., Alfaifi, J., ALGhamdi, M. A., Al Ameer, A. Y., Yahia, A. I. O., Mohammed, O. A., & Adam, M. I. E. (2024). Item analysis: the impact of distractor efficiency on the difficulty index and discrimination power of multiple-choice items. BMC Medical Education, 24(1). https://doi.org/10.1186/s12909-024-05433-y
Robinson, M., Johnson, A. M., Walton, D. M., & MacDermid, J. C. (2019). A comparison of the polytomous Rasch analysis output of RUMM2030 and R (ltm/eRm/TAM/lordif). BMC Medical Research Methodology, 19(1). https://doi.org/10.1186/s12874-019-0680-5
Rodrigo, M. F., Molina, J. G., Losilla, J. M., Vives, J., & Tomás, J. M. (2019). Method effects associated with negatively and positively worded items on the 12-item General Health Questionnaire (GHQ-12): Results from a cross-sectional survey with a representative sample of Catalonian workers. BMJ Open, 9(11). https://doi.org/10.1136/bmjopen-2019-031859
Rogowska, A. M., Ochnik, D., & Kuśnierz, C. (2022). Revisiting the multidimensional interaction model of stress, anxiety and coping during the COVID-19 pandemic: a longitudinal study. BMC Psychology, 10(1). https://doi.org/10.1186/s40359-022-00950-1
Ronk, F. R., Hooke, G. R., & Page, A. C. (2016). Validity of clinically significant change classifications yielded by Jacobson-Truax and Hageman-Arrindell methods. BMC Psychiatry, 16(1). https://doi.org/10.1186/s12888-016-0895-5
Runge, J. M., Lang, J. W. B., Chasiotis, A., & Hofer, J. (2019). Improving the Assessment of Implicit Motives Using IRT: Cultural Differences and Differential Item Functioning. Journal of Personality Assessment, 101(4), 414–424. https://doi.org/10.1080/00223891.2017.1418748
Saat, N. A. (2020). Sains Humanika Humanika Summative Test Items Analysis Using Classical Test Theory (CTT) Analisis Item Kertas Peperiksaan Sumatif Menggunakan Teori Ujian Klasik (TUK). www.sainshumanika.utm.my
Seamon, B. A., Kautz, S. A., & Velozo, C. A. (2019). Rasch Analysis of the Activities-Specific Balance Confidence Scale in Individuals Poststroke. Archives of Rehabilitation Research and Clinical Translation, 1(3–4). https://doi.org/10.1016/j.arrct.2019.100028
Seide, S. E., Röver, C., & Friede, T. (2019). Likelihood-based random-effects meta-analysis with few studies: Empirical and simulation studies. BMC Medical Research Methodology, 19(1). https://doi.org/10.1186/s12874-018-0618-3
Sen, S., Cohen, A. S., & Kim, S. H. (2016). The Impact of Non-Normality on Extraction of Spurious Latent Classes in Mixture IRT Models. Applied Psychological Measurement, 40(2), 98–113. https://doi.org/10.1177/0146621615605080
Stanley, L. M., & Edwards, M. C. (2016). Reliability and Model Fit. Educational and Psychological Measurement, 76(6), 976–985. https://doi.org/10.1177/0013164416638900
Steiner, M. D., & Frey, R. (2021). Representative Design in Psychological Assessment: A Case Study Using the Balloon Analogue Risk Task (BART). Journal of Experimental Psychology: General, 150(10), 2117–2136. https://doi.org/10.1037/xge0001036
Subali, B., Kumaidi, Aminah, N. S., & Sumintono, B. (2019). Student achievement based on the use of scientific method in the natural science subject in elementary school. Jurnal Pendidikan IPA Indonesia, 8(1), 39–51. https://doi.org/10.15294/jpii.v8i1.16010
Sumantri, S. M., & Retni Satriani. (2016). The Effect of Formative Testing and Self-Directed Learning on Mathematics Learning Outcomes. In International Electronic Journal of Elementary Education (Vol. 8, Issue 3). www.simdik.info/hasilun/index.aspx.
Sumintono, B. (2016). Seminar Nasional Pendidikan IPA Prosiding Seminar Nasional Pendidikan IPA “Mengembangkan Keterampilan Berpikir Tingkat Tinggi Melalui Pembelajaran IPA” Penerbit: S2 IPA UNLAM PRESS PENILAIAN KETERAMPILAN BERPIKIR TINGKAT TINGGI: APLIKASI PEMODELAN RASCH PADA ASESMEN PENDIDIKAN.
Sumintono, & Widhiarso, W. (2015). Aplikasi Permodelan Rasch Pada Assessment Pendidikan (B. Trim, Ed.; Cetakan I). Trim Komunikata. www.trimkomunikata.com
Teresi, J. A., Wang, C., Kleinman, M., Jones, R. N., & Weiss, D. J. (2021). Differential Item Functioning Analyses of the Patient-Reported Outcomes Measurement Information System (PROMIS®) Measures: Methods, Challenges, Advances, and Future Directions. Psychometrika, 86(3), 674–711. https://doi.org/10.1007/s11336-021-09775-0
Tesio, L., Caronni, A., Kumbhare, D., & Scarano, S. (2024). Interpreting results from Rasch analysis 1. The “most likely” measures coming from the model. Disability and Rehabilitation, 46(3), 591–603. https://doi.org/10.1080/09638288.2023.2169771
Trampush, J. W., Yang, M. L. Z., Yu, J., Knowles, E., Davies, G., Liewald, D. C., Starr, J. M., Djurovic, S., Melle, I., Sundet, K., Christoforou, A., Reinvang, I., Derosse, P., Lundervold, A. J., Steen, V. M., Espeseth, T., Räikkönen, K., Widen, E., Palotie, A., … Lencz, T. (2017). GWAS meta-analysis reveals novel loci and genetic correlates for general cognitive function: A report from the COGENT consortium. Molecular Psychiatry, 22(3), 336–345. https://doi.org/10.1038/mp.2016.244
Tzafilkou, K., Perifanou, M., & Economides, A. A. (2022). Development and validation of students’ digital competence scale (SDiCoS). International Journal of Educational Technology in Higher Education, 19(1). https://doi.org/10.1186/s41239-022-00330-0
Vaccarino, A. L., Black, S. E., Gilbert Evans, S., Frey, B. N., Javadi, M., Kennedy, S. H., Lam, B., Lam, R. W., Lasalandra, B., Martens, E., Masellis, M., Milev, R., Mitchell, S., Munoz, D. P., Sparks, A., Swartz, R. H., Tan, B., Uher, R., & Evans, K. R. (2023). Rasch analyses of the Quick Inventory of Depressive Symptomatology Self-Report in neurodegenerative and major depressive disorders. Frontiers in Psychiatry, 14. https://doi.org/10.3389/fpsyt.2023.1154519
Van Vliet, M., Doornenbal, B. M., Boerema, S., & Van Den Akker-Van Marle, E. M. (2021). Development and psychometric evaluation of a Positive Health measurement scale: A factor analysis study based on a Dutch population. BMJ Open, 11(2). https://doi.org/10.1136/bmjopen-2020-040816
Van Zile-Tamsen, C. (2017). Using Rasch Analysis to Inform Rating Scale Development. Research in Higher Education, 58(8), 922–933. https://doi.org/10.1007/s11162-017-9448-0
Wang, F., Liu, Q., Chen, E., Huang, Z., Chen, Y., Yin, Y., Huang, Z., & Wang, S. (2020). Neural Cognitive Diagnosis for Intelligent Education Systems. www.aaai.org
Wang, L., Wu, Y. X., Lin, Y. Q., Wang, L., Zeng, Z. N., Xie, X. L., Chen, Q. Y., & Wei, S. C. (2022). Reliability and validity of the Pittsburgh Sleep Quality Index among frontline COVID-19 health care workers using classical test theory and item response theory. Journal of Clinical Sleep Medicine, 18(2), 541–551. https://doi.org/10.5664/jcsm.9658
Wilberforce, M., Sköldunger, A., & Edvardsson, D. (2019). A Rasch analysis of the Person-Centred Climate Questionnaire - Staff version. BMC Health Services Research, 19(1). https://doi.org/10.1186/s12913-019-4803-9
Wu, X., Zhang, L. J., & Liu, Q. (2021). Using Assessment for Learning: Multi-Case Studies of Three Chinese University English as a Foreign Language (EFL) Teachers Engaging Students in Learning and Assessment. Frontiers in Psychology, 12. https://doi.org/10.3389/fpsyg.2021.725132
Wyse, A. E., & Mapuranga, R. (2009). Differential Item Functioning Analysis Using Rasch Item Information Functions. International Journal of Testing, 9(4), 333–357. https://doi.org/10.1080/15305050903352040
Ye, S., Sun, K., Huynh, D., Phi, H. Q., Ko, B., Huang, B., & Ghomi, R. H. (2022). A Computerized Cognitive Test Battery for Detection of Dementia and Mild Cognitive Impairment: Instrument Validation Study. JMIR Aging, 5(2). https://doi.org/10.2196/36825
Yow, W. Q., & Priyashri, S. (2019). Computerized Electronic Features Direct Children’s Attention to Print in Single-and Dual-Language e-Books. AERA Open, 5(3). https://doi.org/10.1177/2332858419878126
Zlatkin-Troitschanskaia, O., Pant, H. A., Toepper, M., Lautenbach, C., & Molerov, D. (2017). Valid Competency Assessment in Higher Education: Framework, Results, and Further Perspectives of the German Research Program KoKoHs. AERA Open, 3(1). https://doi.org/10.1177/2332858416686739
Downloads
Published
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with JPBI (Jurnal Pendidikan Biologi Indonesia) agree to the following terms:
- For all articles published in JPBI, copyright is retained by the authors. Authors give permission to the publisher to announce the work with conditions. When the manuscript is accepted for publication, the authors agree to automatic transfer of the publishing right to the publisher.
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlike 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.