PTAM 2014-4

CONTENT

Competence-oriented oral examinations: objective and valid
Karl Westhoff & Carmen Hagemeister
Full article .pdf (Diamond Open Access)

The structural validity of the FPI Neuroticism scale revisited in the framework of the generalized linear model
Karl Schweizer & Siegbert Reiß
Full article .pdf (Diamond Open Access)

Constrained Multidimensional Adaptive Testing without intermixing items from different dimensions
Ulf Kroehne, Frank Goldhammer & Ivailo Partchev
Full article .pdf (Diamond Open Access)

Screening reading comprehension in adults: Development and initial evaluation of a reading comprehension measure
René T. Proyer, Michaela M. Wagner-Menghin & Gyöngyi Grafinger
Full article .pdf (Diamond Open Access)

Testing fit of latent trait models for responses and response times in tests
Jochen Ranger & Jörg-Tobias Kuhn
Full article .pdf (Diamond Open Access)

Multiple group cognitive diagnosis models, with an emphasis on differential item functioning
Ann Cathrice George & Alexander Robitzsch
Full article .pdf (Diamond Open Access)

ABSTRACTS

Competence-oriented oral examinations: objective and valid
Karl Westhoff & Carmen Hagemeister

Abstract
The Decision-Oriented Interview (DOI) takes account of behavioral regularities in interviewing and is therefore suitable for all types of interview, including oral examinations. First we describe how all effective oral examinations examine the content of a certain part of a course of study or the requirements of practice after graduation, and show that these requirements can always be arranged into a generally valid hierarchy. In an oral examination it is best to start by examining the simplest requirement and then move on through the hierarchy. In order for an examination to be fair and valid one needs a description of the basic set of all questions for the specific subject. This makes possible a criteria-oriented measurement of the candidate`s competence. The characteristics of the DOI as an oral examination technique are described, and an example given to show what the sequence of questions in the subject “Psychological Assessment” can look like. Empirical results from far more than 1000 oral examinations have shown that the use of the DOI leads to objective and valid assessments. In the conclusion checklists for the DOI as an oral examination procedure are given and explained.

Keywords: oral examinations, decision-oriented interview, requirements, hierarchy of requirements, criterion-oriented measurement, checklists, objectivity, validity

Prof.em. Dr. Karl Westhoff
Wiesenstraße 63a
51371 Leverkusen, Germany
mail@karl-westhoff.de


The structural validity of the FPI Neuroticism scale revisited in the framework of the generalized linear model
Karl Schweizer & Siegbert Reiß

Abstract
The structural validity of the FPI Neuroticism scale that is composed of binary items is investigated by means of confirmatory factor analysis. Because of the binary nature of the items a link function is integrated into the model of measurement that turns it into a generalized linear model, and probability-based covariances serve as input. The structural investigation reveals that the scale shows a substructure that reflects the contents of the items originating from two different domains: the mental and physical domains. The weighted congeneric bifactor model shows that the general factor is the dominating factor besides two less prominent factors referring the mental and physical domains. A sufficient degree of homogeneity is indicated by McDonald`s Omega coefficient. The use of factor scores is recommended for the representation of neuroticism.

Keywords: congeneric model, weighted congeneric model, neuroticism, link transformation, probability-based covariances

Karl Schweizer, PhD
Department of Psychology
Goethe University Frankfurt
Grüneburgplatz 1
60323 Frankfurt a. M., Germany
K.Schweizer@psych.uni-frankfurt.de


Constrained Multidimensional Adaptive Testing without intermixing items from different dimensions
Ulf Kroehne, Frank Goldhammer & Ivailo Partchev

Abstract
Multidimensional adaptive testing (MAT) can improve the efficiency of measuring traits that are known to be highly correlated. Content balancing techniques can ensure that tests fulfill requirements with respect to content areas, such as the number of items from various dimensions (target rates). However, content balancing does not restrict the order in which items are selected from dimensions. If multiple dimensions are measured with MAT, intermixing items from different dimensions might invalidate properties of those items, which are known from calibration studies without mixed item content. To avoid this, the known correlations between traits can be used to increase efficiency of the ability estimation only without intermixing items from different dimensions. In this simulation study, MAT allowing items to be intermixed between dimensions is compared to Constrained MAT (CMAT) that does not allow intermixing items between dimensions for items with between-item multidimensionality. As expected, MAT achieved the greatest reliability for equal target rates; however, CMAT with items administered in a prespecified order dimension by dimension was not disadvantageous for unequal target rates.

Keywords: multidimensional adaptive testing, content balancing, item response theory, multidimensional scoring, intermixing items

Ulf Kroehne, PhD
German Institute for International Educational Research
Schloßstraße 29
60486 Frankfurt am Main, Germany
kroehne@dipf.de


Screening reading comprehension in adults: Development and initial evaluation of a reading comprehension measure
René T. Proyer, Michaela M. Wagner-Menghin & Gyöngyi Grafinger

Abstract
Reading comprehension in adults is a rather neglected variable in the practice of psychological assessment. We propose a new screening instrument for adult reading comprehension based on a pragmatic definition of reading comprehension as the textual understanding of the text read. Using data from a calibration sample (n = 266) and a replication sample (n = 148) for cross-validation, we tested the model fit for the 1-PL model (Rasch-model; graphic model test, Anderson’s Conditional Likelihood-Ratio test). Model fit was established and verified in the replication sample after the stepwise exclusion of three (out of 16) items. Correlations with a test for memory and the external criterion reading proficiency were in the expected direction. The comparison of a sub-group of putatively highly skilled readers (n = 59; University students and lecturers) and putatively low skilled readers (n = 122; participants undergoing psychological assessment for having their driving license reinstated after a ban) showed that a percent rank < 10 in the measure might indicate insufficient reading skills for practical purposes. Pending further research, the instrument seems to be a useful instrument for the screening of reading comprehension skills in adults.

Keywords: Computer Aided Testing, Item-Response Theory, Reading Comprehension, Test development

Dr. René Proyer
Department of Psychology
University of Zurich
Binzmühlestrasse 14/7
8050 Zurich, Switzerland
r.proyer@psychologie.uzh.ch


Testing fit of latent trait models for responses and response times in tests
Jochen Ranger & Jörg-Tobias Kuhn

Abstract
The joint analysis of responses and response times in psychological tests with latent trait models has become popular recently. Although numerous such models have been proposed so far there are only few tests of model fit. In this manuscript a new approach to the evaluation of model fit is presented. The approach is based on the differences between the observed frequencies of positive or negative responses given during fixed time intervals and the corresponding expected frequencies implied by the model. Summing the squared differences yields a test statistic that is approximately chi-squared distributed. Different forms of the test can be implemented. Jointly considering all items allows for the evaluation of global fit whereas examining each item separately allows for the assessment of item fit. Depending on the definition of the frequencies one can test for specific forms of model misfit, e.g. wrong assumptions about the response time distribution, about the relation of responses and response times in the same item or about the relation of responses and response times from different items. The validity and power of the test is demonstrated in a simulation study. It can be shown that the test adheres to the nominal Type-I error rate and has high power.

Key words: item response model, response time model, fit test

Jochen Ranger, PhD
Martin-Luther-Universität Halle-Wittenberg
Institut für Psychologie
Brandbergweg 23c
06120 Halle (Saale), Germany
jochen.ranger@psych.uni-halle.de


Multiple group cognitive diagnosis models, with an emphasis on differential item functioning
Ann Cathrice George & Alexander Robitzsch

Abstract
In recent years, cognitive diagnosis models (CDMs) have received a growing attention because of their potential to diagnose achievement on the level of sub-competencies. In the context of that development researchers have introduced relevant tools for the practical application of CDMs, as for example multiple group approaches and differential item functioning (DIF) detection. However, when applying CDMs and these related methods to large scale data, one has to overcome a diversity of obstacles: With a growing number of sub-competencies, the models may, due to a large number of parameters, become often (nearly) non-identifiable and thus extremely hard to estimate. Additionally, significance tests may become significant for the only reason of sample size necessitating adequate effect sizes. The present article aims at two aspects: First, it summarizes existing CDM methods for multiple group models and DIF analyses. Second, it gives hints for their application to large-scale assessment data, amongst others we introduce an adapted estimation routine and an appropriate effect size. Both aspects are illustrated by means of the Austrian educational standards test in mathematics 2012 containing a sample size of 71464 students and 72 items.

Keywords: Cognitive Diagnosis Modelling, Multiple Group Models, Differential Item Functioning, Large-Scale Assessment Data

Ann Cathrice George, PhD
Federal Institute for Educational Research
Innovation and Development
of the Austrian School System
Salzburg, Austria
a.george@bifie.at

Psychological Test and Assessment Modeling
Volume 56 · 2014 · Issue
4
Pabst, 2014
ISSN 2190-0493 (Print)
ISSN 2190-0507 (Internet)