PTAM 2025

CONTENT

Discrimination between types of common systematic variation in data contaminated by method effects using CFA models
Karl Schweizer
DOI: https://doi.org/10.2440/001-0016
Full article .pdf (Diamond Open Access)

Academic and Social Profiles of Adolescents with Autism
Christiane Lange-Küttner
DOI: https://doi.org/10.2440/001-0017
Full article .pdf (Diamond Open Access)

Treating Rapid Responses as Errant Improves Reliability of Estimates for Test Performance for NAEP
Daniel B. Wright and Sarah M. Wolff
DOI: https://doi.org/10.2440/001-0018
Full article .pdf (Diamond Open Access)

Development and Factorial Validation of the Rapid Assessment Test of Individual Misconceptions About Giftedness (RATIMAG)
Merve Irem Ercan & Albert Ziegler
DOI: https://doi.org/10.2440/001-0019
Full article .pdf (Diamond Open Access)

Applicability of Process Models for the Joint Analysis of Responses and Response Times in Complex Cognitive Tasks
Raimund J. Krämer, Jochen Ranger, Marco Koch, Frank M. Spinath, Florian Schmitz
DOI: https://doi.org/10.2440/001-0020
Full article .pdf (Diamond Open Access)

Modeling Local Item Dependence in PIRLS 2016: Comparing ePIRLS and Paper-Based PIRLS Using the Rasch Testlet Model
Purya Baghaei, Hamdollah Ravand & Rolf Strietholt
DOI: https://doi.org/10.2440/001-0021
Full article .pdf (Diamond Open Access)

Equivalent Selective Attention Scores From Different Digital Devices: On the Fairness of Assessments Designed for Smartphones, Tablets, and Desktop Computers
Isabell Baldauf, Maximilian O. Steininger, Georg Mandler & Marco Vetter
DOI: https://doi.org/10.2440/001-0022
Full article .pdf (Diamond Open Access)

ABSTRACTS

Discrimination between types of common systematic variation in data contaminated by method effects using CFA models
Karl Schweizer

Abstract: In data contaminated by method effects, common systematic variation is inhomogeneous requiring that attribute-related common systematic variation is in structural investigations discriminated from other variation. In the reported study, CFA measurement models dealing differently with such inhomogeneity were compared with respect to their performance in investigating data contaminated by either speededness or high subset homogeneity. For this purpose, structured random data with five different levels of speededness respectively subset-homogeneity were generated and investigated. The investigations were conducted by the one-factor congeneric and tau-equivalent CFA models, as well as the bifactor CFA model designed as mixture of tau-equivalent and fixed-links models. In data with speededness the congeneric model indicated good model fit while the tau-equivalent model showed sensitivity for the effect. In data with subset-homogeneity both models showed sensitivity. Only the bifactor model accounted for the common systematic variation and discriminated well between the attribute and method effects.

Keywords: comfirmatory factor analysis, discrimination, speededness, subset homogeneity, method effect

Correspondence:
Karl Schweizer, Institute of Psychology, Goethe University Frankfurt, Theodor-W.-Adorno-Platz 6, 60323 Frankfurt a. M., Germany. K.Schweizer@psych.uni-frankfurt.de


Academic and Social Profiles of Adolescents with Autism
Christiane Lange-Küttner

Abstract: The INSIDE project is a longitudinal study of pupils who are in schools that provide inclusive education. At the beginning of the secondary school tier, there were 22 pupils with an autism diagnosis, two of them female, at ages 11 to 14 years. A comparison group with case-control matching in terms of gender and exact age in months was created by randomly selecting datasets of pupils from a large INSIDE panel (N = 2693). Adolescents with autism had the same level of competence in reading and mathematics as the comparison group but language grades were lower, most likely because of shortcomings in classroom discussion contributions. Factor analysis of questionnaires about the academic and social self-concept and school inclusion explained between 66.8% to 80.1% of the variance. In adolescents with autism, clear psychological dimensions of positive self-esteem, self-control, and peer relationships emerged. In contrast, in the comparison group, peer relationships were relevant for nearly every dimension showing the importance of social context for mainstream pupils. There was also a difference insofar as for adolescents with autism, critical thinking and evaluation was an important dimension, while for the comparison group, independent decision-making and speaking up was more relevant. Thus, while there was some common ground, there were differences revealed in both the composition of the main factors as well as in crucial anchor items of factors that explained less variance.

Keywords: Autism, Self-Concept, Peer Relationships, Self-Regulation

Correspondence:
Christiane Lange-Küttner, e-mail: c.langekuettner@uni-bremen.de


Treating Rapid Responses as Errant Improves Reliability of Estimates for Test Performance for NAEP
Daniel B. Wright and Sarah M. Wolff

Response times are often available for educational assessments and psychometricians have proposed methods for using them when estimating test performance. Several approaches have been explored to see if estimates can be improved. Previous research has shown that a simple mechanism, based on the idea that some people who guess do so rapidly, can improve the reliability of estimates for personalized formative assessments and college-admission data. The method involves Treating All Rapid Responses as Errant (TARRE) responses. Here we examine if this approach can improve reliability estimates for the National Assessment of Educational Progress (NAEP) eighth grade math assessments in the US, data for which were recently re-leased. Treating rapid responses as errant can improve reliability estimates for multiple choice questions (MCQs), but did not for non-MCQ formats. Male test takers were affected more than female test takers and those in the low and high proficiency categories were affected more than those in the middle proficiency categories. Using this procedure, or any procedure that takes into account response times, outside of a research context has further considerations. The scoring method can affect student behavior and their learning, and these consequences are discussed.

Keywords: Rapid guessing, NAEP, response times, TARRE, assessment

Correspondence:
D. B. Wright. e-mail: daniel.wright@unlv.edu


Development and Factorial Validation of the Rapid Assessment Test of Individual Misconceptions About Giftedness (RATIMAG)
Merve Irem Ercan & Albert Ziegler

Given the prevalence of myths and misconceptions about giftedness among educators and the general population, this study aims to develop an economical and quick test to measure essential insights into giftedness. The items were developed, drawing on a collection of scientific articles dedicated to common misconceptions about giftedness, as well as subsequent input regarding the construct validity of the items from professionals in the field of giftedness. The final version of Rapid Assessment Test of Individual Misconceptions about Giftedness (RATIMAG) comprises 20 items. Three subscales measure (1) characteristics and needs of the gifted, (2) assessment and achievements, and (3) personality and social-emotional aspects of the development of giftedness. The RATIMAG was pilot-tested with 494 participants, including university researchers, preservice students in teacher education, and teachers. Exploratory and confirmatory factor analysis, along with parallel analysis, confirmed the factorial validity of the RATIMAG. The participants in this study considered a substantial proportion of the myths correct, even with milder scoring.

Keywords: Giftedness, Myths, Knowledge Test, Construct Validity, Factorial Validation, RATIMAG

Correspondence:
Merve Irem Ercan, merveirem.ercan@fau.de


Applicability of Process Models for the Joint Analysis of Responses and Response Times in Complex Cognitive Tasks
Raimund J. Krämer, Jochen Ranger, Marco Koch, Frank M. Spinath, Florian Schmitz

Process models for the joint analysis of responses and response times have been developed to disentangle different cognitive processes in experimental paradigms. More recently, they have been applied to complex tests of intelligence as well. However, the adequacy of the modelling approaches for such task types has been rarely tested. The present study compared two popular process models, a race model and a diffusion model, with the purely statistical hierarchical model in terms of relative fit to data from typical intelligence tests with varying response formats: a cube rotation test with a binary response format (n = 257), a figural matrix test with a distractor format (n = 229), a figural matrix test with a response construction format (n = 185), and a knowledge test (n = 3142). Compared to the diffusion model, the race and hierarchical models better described the data for all tests but the cube rotation test. Yet neither was able to adequately predict response time quantiles for the matrix construction or knowledge test. Model-based trait estimates displayed only moderate reliability, suggesting limited utility for the assessment of individual differences. This study highlights that process models can be useful for evaluating performance in complex tasks, but emphasizes to carefully consider model assumptions and task requirements.

Keywords: response latency, intelligence testing, assessment, model fit

Correspondence:
Raimund J. Krämer, raimund.kraemer@uni-due.de


Modeling Local Item Dependence in PIRLS 2016: Comparing ePIRLS and Paper-Based PIRLS Using the Rasch Testlet Model
Purya Baghaei, Hamdollah Ravand & Rolf Strietholt

Testlets are groups of items linked by a common theme or stimulus, such as a reading passage or a diagram. While testlets enhance testing efficiency and contextual relevance, they can also violate the local independence assumption underlying item response theory models. Such violations may introduce bias in parameter estimates and inflate reliability coefficients artificially. This study investigates and compares testlet effects in the PIRLS and ePIRLS 2016 assessments. A total of eleven tasks—comprising both ePIRLS and paper-based PIRLS items—were analyzed using data from seven countries. For each country, both the standard unidimensional Rasch model and the Rasch testlet model were fitted separately. The results indicated that testlet variances were generally low across all seven countries. Nonetheless, model fit indices, including the deviance statistic and information criteria, consistently favored the Rasch testlet model over the unidimensional model. The reliability of the general dimension was lower in the testlet model which is consistent with expectations when accounting for local dependence. Importantly, correlations between item difficulty and person ability estimates from the two models were uniformly high (r = .99), suggesting that the unidimensional model still yields reasonable approximations in the presence of moderate testlet effects. Additionally, the findings revealed that paper-based tasks generated higher levels of local dependence than ePIRLS tasks. Potential explanations for this difference, along with the broader implications for test design and validity, are discussed.

Keywords: PIRLS, local dependence, testlet, Rasch testlet model

Correspondence: Purya Baghaei; purya.baghaei@iea-hamburg.de


Equivalent Selective Attention Scores From Different Digital Devices: On the Fairness of Assessments Designed for Smartphones, Tablets, and Desktop Computers
Isabell Baldauf, Maximilian O. Steininger, Georg Mandler & Marco Vetter

Organizations are increasingly offering preemployment assessments on different digital devices to evaluate candidates. However, in most cases, it remains untested whether the psychometric properties of those assessments are equivalent when different devices are used. Thus, for most assessments, it is unclear whether scores of candidates differing in their choice of device can be compared fairly. The aim of this study is to investigate whether employing a mobile first based cognitive assessment yields equivalent scores of selective attention across different devices. Measurement equivalence across device types was tested using data collected from 296 matched participants. Participants completed the assessment on either a desktop computer or a smartphone. The equivalence of selective attention test scores was investigated using confirmatory multigroup factor analysis. Measurement invariance ensures that test-takers with the same latent trait level have equal probabilities of solving each item and achieving the same scores. Employing a mobile first design approach resulted in equivalent psychometric properties of the assessment in both groups, as indicated by measurement invariance on all levels of investigation. Thus, measurement invariance was achieved unconditionally for both test groups. The results of this study provide convincing evidence that adhering to mobile first principles can yield a valid and reliable assessment for selective attention that can be used on different devices equivalently. The study highlights the importance of considering different devices when designing digital assessments to avoid systematically disadvantaging candidates due to their choice of device.

Keywords: online assessment, mobile testing, mobile first design, cognitive assessment, selective attention

Correspondence: Isabell Baldauf Schuhfried GmbH, Hyrtlstraße 45, 2340 Mödling; baldauf@schuhfried.com

Psychological Test and Assessment Modeling
Volume 67 · 2025 · Issue 1

Pabst, 2025
ISSN 2190-0493 (Print)
ISSN 2190-0507 (Internet)