多元計分人格測驗之測量恆等性:非參數方法之試題差異功能分析

Measurement Equivalence between Respondent Groups: A Non-Parametric Differential Item Functioning Analysis of Polytomous Personality Measures

賴姿伶
Tzu-Ling Lai

Doi:10.3966/181665042015121104001


所屬期刊: 第11卷第4期 「測驗與評量」
主編:國立政治大學教育學系特聘教授
余民寧
系統編號: vol043_01
主題: 測驗與評量
出版年份: 2015
作者: 賴姿伶
作者(英文): Tzu-Ling Lai
論文名稱: 多元計分人格測驗之測量恆等性:非參數方法之試題差異功能分析
論文名稱(英文): Measurement Equivalence between Respondent Groups: A Non-Parametric Differential Item Functioning Analysis of Polytomous Personality Measures
共同作者:
最高學歷:
校院名稱:
系所名稱:
語文別:
論文頁數: 22
中文關鍵字: 人格測驗;測量恆等性;差異試題功能;多元計分試題
英文關鍵字: personality measures;measurement equivalence;differential item functioning(DIF);polytomous items
服務單位: 銘傳大學諮商與工商心理學系助理教授
稿件字數: 7409
作者專長: 工商心理學、心理測驗、人才甄選、網路測驗、IRT、生涯輔導
投稿日期: 2015/7/18
論文下載: pdf檔案icon
摘要(中文): 自陳式人格測驗經常以李克特式多元計分試題的方式呈現。然而,此類作答方式卻容易引起對於不同應試族群是否產生了不同的測量效果之疑慮,例如,當測量目的是為進行甄選時,受試者是否可能為了獲得錄取而刻意往高分的方向填答(亦即一般所稱的「作假」),而使得測量結果和其他情境下產生差異?過去已有大量研究探討應徵者在李克特式多分題的作答是否和一般學生或在職者不同,但卻多從整份測驗的層次著手,甚少針對試題層次的測量特性進行分析。本研究運用非參數的多分題同步試題偏差檢定法(poly-SIBTEST)來進行應徵者和在職者在試題層次以及量表層次的測量恆等性分析。研究結果發現:的確有若干試題對於不同的應試族群具有差異試題功能(DIF);然而,由於差異試題功能並無系統性地偏利於某一族群,因此在所有的五個人格量表中皆未呈現差異測驗
功能(DTF)。分析結果顯示多分題人格測驗應用於甄選情境時,所測量到的潛在特性和其他情境是相等的。
摘要(英文): The question of whether applicants respond to self-report personality measures differently when responding for selection purposes has been a crucial concern for decades. However, little
research has focused on item-level measurement properties to identify the effect of testing situations on polytomous personality items. This study conducted a non-parametric poly-SIBTEST procedure to investigate both item-level and scale-level measurement equivalence on polytomous Likert-type personality scales between applicants and incumbents. The results indicated that several items exhibited differential item functioning (DIF); however, because DIF items did not systematically function with bias toward a particular group, substantial test functioning variations were not observed for all five scales. The items seemed to measure the same underlying constructs between applicants and incumbents.
參考文獻: Barrick, M. R., & Mount, M. K. (1991). The big-five personality dimensions job performance: A meta-analysis. Personnel Psychology, 44, 1-26.
Birkeland, S. A., Manson, T. M., Kisamore, J. L., Brannick, M. T., & Smith, M. A. (2006). A meta-analytic investigation of job applicant faking on personality measures. International Journal of Selection and Assessment, 14 (4), 317-335.
Bolt, D. & Stout, W. (1996). Differential item functioning: Its multidimensional model and resulting SIBTEST detection procedure. Behaviormetrika, 23 (1), 67-95.
Bollen, K.A. (1989). A new incremental fit index for general structureal equation models. Sociological Methods and Research, 17, 303-316.
Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 136-162). Beverly Hills, CA: Sage.
Byrne, B. M., Shavelson, R. J., & Muthen, B. (1989). Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance. Psychological Bulletin, 105, 456-466.
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage.
Chang, H. H., Mazzeo, J., Roussos, L. (1996). Detecting DIF for polytomously scored items: An adaptation of the SIBTEST procedure. Journal of Educational Measurement, 33, 333-353.
Chernyshenko, O. S., Chan, K. Y., Stark, S., Drasgow, F., & Williams, B. (1999, April). Fitting item response theory models to personality data. Paper presented at the 14th Annual Conference of the Society for Industrial and Organizational Psychology, Atlanta, GA.
Doulas, J. E., Roussos, L. A., & Stout, W. (1996). Item-bundle DIF hypothesis testing: Identifying suspect bundles and assessing their differential functioning. Journal of Educational Measurement, 33, 465-484.
Drasgow, F., & Hulin, C. L. (1990). Item response theory. In M. D. Dunnette & L. M. Hough (Eds.), Handbook of industrial & organizational psychology (pp. 577-636). Palo Alto, CA: Consulting Psychologists.
Ellingson, J. E., Sackett, P. R., & Connelly, B. S. (2007). Personality assessment across selection and development contexts: Insights into response distortion. Journal of Applied Psychology, 92(2), 386-395.
Ellingson, J. E., Smith, D. B., & Sackett, P. R. (2001). Investigating the influence of social desirability on personality factor structure. Journal of Applied Psychology, 86(1), 122-133.
Frei, R. L., Griffith, R. L., McDaniel, M. A., Snell, A. F., & Douglas, E. F. (1997). Faking non-cognitive measures: Factor invariance using multiple groups LISREL. In G. Alliger (Chair), Faking matters. Symposium conducted at the annual meeting of the Society for Industrial and Organizational Psychology, St. Louis, MO.
Griffith, R. L., Chmielowski, T., Yoshita, Y. (2007). Do applicants fake? An examination of the frequency of applicant faking behavior. Personnel Review, 36, 341–355.
Hogan, J., Barrett, P., & Hogan, R. (2007). Personality measurement, faking, and employment selection. Journal of Applied Psychology, 92(5), 1270-1285.
Holland, P. W., & Thayer, D. T. (1988). Differential item performance and Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test Validity (pp. 129-145). Hillsdale NJ: Lawrence Erlbaum Associates.
Hough, L. M., Eaton, N. K., Dunnette, M. D., Kamp, J. D., & McCloy, R. A. (1990). Criterion-related validities of personality constructs and the effect of response distortion on those validities. Journal of Applied Psychology, 75, 581-595.
Hough, L. M., & Schneider, R. J. (1996). Personality traits, taxonomies, and applications in organizations. In K. R. Murphy (Ed.), Individual differences and behavior in organizations (pp. 31-88). San Francisco, CA: Jossey-Bass.
Lord, F. M. (1980). Application of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates.
Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719-748.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.
Maydeu-Olivares, A. (2005). Further empirical results on parametric versus non-parametric IRT modeling of Likert-type personality data. Multivariate Behavioral Research, 40(2), 261–279
Mount, M. K., & Barrick, M. R. (1995). The Big Five personality dimensions: Implications for research and practice in human resources management. In G. Ferris (Ed.), Research in personnel and human resources management ( Vol. 13, pp. 153-200 ). Greenwich, CT: JAI.
O’Brien, E., & LaHuis, D. M. (2011). Do applicants and incumbents respond to personality items similarly? A comparison of dominance and ideal point response models. International Journal of Selection and Assessment, 19(2), 109-118.
Raju, N. S., Laffitte, L. J., & Byrne, B. M. (2002). Measurement equivalence: A comparison of methods based on confirmatory factor analysis and item response theory. Journal of Applied Psychology, 87(3), 517-529.
Raju, N. S., van der Linden, W. J., & Fleer, P. F. (1995). IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19, 353-368.
Robie, C., Zickar, M. J., & Schmit, M. J. (2001). Measurement equivalence between applicant and incumbent groups: An IRT analysis of personality scales. Human Performance, 14, 187-207.
Roussos, L. A., & Stout, W. (1996). A multidimensionality-based DIF analysis paradigm. Applied Psychological Measurement, 20, 355-371.
Salgado, J. F.(1997). The five factor model of personality and job performance in the European community. Journal of Applied Psychology, 82(1), 30-43.
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometric Monograph, 34, (Suppl.17).
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124, 262-274.
Schmit, M. J., & Ryan, A. M. (1993). The Big Five in personnel selection: Factor structure in applicant and non-applicant populations. Journal of applied psychology, 78, 966-974.
Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item Bias/DIF. Psychometrika, 58, 159-194.
Smith, D. B., & Ellingson, J. E. (2002). Substance versus style: A new look at social desirability in motivating contexts. Journal of applied psychology, 87(2), 211-219.
Somes, G. W. (1986). The generalized Mantel- Haenszel statistic. The American Statistician, 40, 106-108.
Stark, S., Chernyshenko, O. S., Chan, K. Y., Lee, W. C., & Drasgow, F. (2001). Effects of the testing situation on item responding: Cause for concern. Journal of Applied Psychology, 86 (5), 943-953.
Viswesvaran, C., & Ones, D. S. (1999). Meta-analyses of fakability estimates: Implications for personality measurement. Educational and Psychological Measurement, 59, 197-210.
Zickar, M. J. (2000). Modeling faking on personality tests. In D. R. Ilgen, & C. L. Hulin (Eds.), Computational modeling of behavior in organizations: The third scientific discipline (pp. 95–113). Washington, DC: American Psychological Association.
Zickar, M. J., Gibby, R. E., & Robie, C. (2004). Uncovering faking samples in applicant, incumbent, and experimental data sets: An application of mixed-model item response theory. Organizational research methods, 7(2), 168-190.
Zickar, M. J., & Robie, C. (1999). Modeling faking good on personality items: An item-level analysis. Journal of Applied Psychology, 84(4), 551-563.
Zickar, M. J., & Ury, K. L. (2002). Developing an interpretation of item parameters for personality items: Content correlates of parameter estimates. Educational and Psychological Measurement, 62, 19-31.
賴姿伶、余民寧、徐崇文(2009)。員工甄選人格量表的編製及其信效度考驗之初步報告。教育研究與發展期刊,5(4),269-304。