PTAM 2011-3 | Methodological advances in psychological and educational testing – Part I
Methodological advances in psychological and educational testing – Part I
Matthias von Davier (Eds.)
CONTENT
Performance of the bootstrap Rasch model test under violations of non-intersecting item response functions
Moritz Heene, Clemens Draxler, Matthias Ziegler & Markus Bühner
Full article .pdf (Diamond Open Access)
INTAKT: A new instrument for assessing the quality of mother-child interactions
Nicole Hirschmann, Ursula Kastner-Koller, Pia Deimann, Nadine Aigner & Tanja Svecz
Full article .pdf (Diamond Open Access)
Special Topic – Part 1:
Methodological advances in psychological and educational testing
Guest Editor: Matthias von Davier
Guest-Editorial
Matthias von Davier
Full article .pdf (Diamond Open Access)
Investigation of model fit and score scale comparability in international assessments
Maria Elena Oliveri & Matthias von Davier
Full article .pdf (Diamond Open Access)
Modeling response times with latent variables: Principles and applications
Wim J. van der Linden
Full article .pdf (Diamond Open Access)
A review of recent response-time analyses in educational testing
Yi-Hsuan Lee & Haiwen Chen
Full article .pdf (Diamond Open Access)
On the impact of missing values on item fit and the model validness of the Rasch model
Christine Hohensinn & Klaus D. Kubinger
Full article .pdf (Diamond Open Access)
ABSTRACTS
Performance of the bootstrap Rasch model test under violations of non-intersecting item response functions
Moritz Heene, Clemens Draxler, Matthias Ziegler & Markus Bühner
Abstract
The Rasch model is known in particular for its properties of parameter separability and specific objectivity. The extent to which this property is attained depends on the magnitude of the discrepancy between the data and the model. The use of reliable model fit tests which can detect model violations is therefore essential before a psychological test is used and inferences based on the requirements of the Rasch model are drawn. This paper provides a critical analysis of the performance of the parametric bootstrap model test (von Davier, 1997) in the presence of non-parallel item response functions as violations of a basic requirement of the dichotomous Rasch model. Based on results from simulated data it is shown that in general the bootstrap test leads too often to failures to reject non-fitting data.
Keywords: Rasch model, Rasch model test, Rasch model violation, Bootstrap
Dr. Moritz Heene
Karl Franzens University
Department Psychology
Unit Psychological Diagnostics
Maiffredygasse 12b
A-8010 Graz, Austria
moritz.heene@uni-graz.at
INTAKT: A new instrument for assessing the quality of mother-child interactions
Nicole Hirschmann, Ursula Kastner-Koller, Pia Deimann, Nadine Aigner & Tanja Svecz
Abstract
Despite abundant evidence for the influence of primary caregivers interaction with young children, on their further development, there is a lack of standardized and published inventories for assessing the quality of such interactions. INTAKT, a newly developed instrument, which helps to rate maternal sensitivity, maternal feedback, and maternal interaction in joint attention episodes, is designed to close this gap. Two studies examined the psychometric properties of INTAKT, applying it to different kinds of mother-child dyads. Inter-rater reliabilities, as well as validation data using internal and external criteria, showed that the INTAKT scales allowed for an objective, reliable, and valid assessment of interaction quality between mothers and their children. Thus, the inventory is suitable as a diagnostic instrument for assessing the quality of mother-child interactions.
Keywords: mother-child interaction, sensitivity, feedback, joint attention, assessment
Mag. Nicole Hirschmann
University of Vienna
Faculty of Psychology
Department of Developmental Psychology and Psychological Assessment
Liebiggasse 5
A-1010 Wien, Austria
nicole.hirschmann@univie.ac.at
Investigation of model fit and score scale comparability in international assessments
Maria Elena Oliveri & Matthias von Davier
Abstract
This study used item response data from 30 countries who participated in the Programme for International Student Assessment (PISA). It compared reduction of proportion of item misfit associated with alternative item response theory (IRT; multidimensional and multi-parameter Rasch and 2 parameter logistic; 2PL) models and linking (mean-mean IRT vs. Lagrangian multiplier and concurrent calibration) approaches to those currently used by PISA to conduct score scale calibrations. The analyses are conducted with the general diagnostic model (GDM), which is a modeling framework that contains all IRT models used in the paper as special cases. The paper also investigated whether the use of an alternative score scale (i.e., a scale that includes the use of international and a subset of country-specific parameters) as compared to the use of solely international parameters for country score scale calibrations led to improvement of fit. Analyses were conducted using discrete mixture distribution IRT as well as multiple group (M-)IRT models. As compared to a scale that uses all international parameters, substantial improvement of fit was obtained using the concurrent calibration linking approach with the multi-group 2PL model allowing for partially-unique country parameters.
Keywords: international large-scale assessments, item response theory, general diagnostic model, trends
Maria Elena Oliveri, PhD ABD
University of British Columbia
Vancouver, BC, Canada
oliveri.m@live.com
Modeling response times with latent variables: Principles and applications
Wim J. van der Linden
Abstract
The introduction of the computer in psychological and educational testing has enabled us to record response times on test items in real time, without any interruption of the response process. This article reviews key principles for probabilistic modeling of these response times and discusses a hierarchical model that follows from the principles. It then shows the potential of the model for improving the current practices of item calibration, adaptive testing, controlling test speededness, and detection of cheating.
Keywords: adaptive testing, cheating detection, item calibration, hierarchical modeling, item-response theory, latent-variable modeling, response-time modeling, speededness, test design
Wim J. van der Linden, PhD
CTB/McGraw-Hill
20 Ryan Ranch Road
Monterey, CA 93940, USA
wim_vanderlinden@ctb.com
A review of recent response-time analyses in educational testing
Yi-Hsuan Lee & Haiwen Chen
Abstract
The use of computer-based assessments in educational testing allows researchers to collect response times at a large scale for each test taker in standardized operational settings. The availability of these data sources has triggered more research aimed at utilizing response times to aid in current psychometric practice. This paper provides an overview of the type of research accomplished after year 2000. Future research directions are discussed.
Keywords: Response time, time-limit tests, high-stakes testing, low-stakes testing
Yi-Hsuan Lee, PhD
Educational Testing Service
Rosedale Rd
Princeton, NJ 08541, USA
ylee@ets.org
On the impact of missing values on item fit and the model validness of the Rasch model
Christine Hohensinn & Klaus D. Kubinger
Abstract
A crucial point regarding the development and calibration of an aptitude test is the presence of missing values. In most test administrations, examinees omit individual items even in high-stakes tests. The most common procedure for treating these missing values in data analysis is to score these responses as incorrect; however, an alternative would be to consider omitted responses as if they were not administered to the examinee in question. Previous research has found that both procedures for dealing with missing values result in bias in item and person parameter estimation. Regarding test construction, not only is there an interest in item parameter estimation, but also in global and item-specific model tests as well as goodness-of-fit indices. On the basis of such statistics, it will be decided which items constitute the final item pool of a test. The present study therefore investigates the influence of two different procedures for dealing with missing values on model and item-specific tests as well as item fit indices for the Rasch model. The impact of these different treatment alternatives is shown for an empirical example and, furthermore, for simulated data. Simulations reveal that the global model test, as well as the item test, is affected by the procedures used to deal with missing values. To summarize, the results indicate that scoring omitted items as incorrect leads to seriously biased results.
Keywords: Missing values, Rasch model, item fit; model test, goodness of fit statistic
Christine Hohensinn, PhD
Division of Psychological Assessment and Applied Psychometrics
Faculty of Psychology
University of Vienna
Liebiggasse 5
A-1010 Vienna, Austria
christine.hohensinn@univie.ac.at
Psychological Test and Assessment Modeling
Volume 53 · 2011 · Issue 3
Pabst, 2011
ISSN 2190-0493 (Print)
ISSN 2190-0507 (Internet)