期刊專區/ Periodical Section

表現標準設定之擴大參與：教學現場效度證據

Extending Participation in Standard Setting: A Validation Study from School Teachers’ Perspective

林世華;謝佩蓉;謝進昌
Sieh-Hwa Lin;Pei-Jung Hsieh;Jin-Chang Hsieh

所屬期刊：	第8卷第4期「測驗與評量」主編：國立政治大學心理學系教授林邦傑
系統編號：	vol031_01
主題：	測驗與評量
出版年份：	2012
作者：	林世華;謝佩蓉;謝進昌
作者(英文)：	Sieh-Hwa Lin;Pei-Jung Hsieh;Jin-Chang Hsieh
論文名稱：	表現標準設定之擴大參與：教學現場效度證據
論文名稱(英文)：	Extending Participation in Standard Setting: A Validation Study from School Teachers’ Perspective
共同作者：
最高學歷：
校院名稱：
系所名稱：
語文別：
論文頁數：	18
中文關鍵字：	對照組法;表現標準;切截分數;效度覆核
英文關鍵字：	contrasting group method;performance standard;cut score;validation
服務單位：	國立臺灣師範大學教育心理與輔導學系副教授;國家教育研究院測驗及評量研究中心助理研究員.國立臺北科技大學技術及職業教育研究所博士;國家教育研究院測驗及評量研究中心副研究員
稿件字數：	9363
作者專長：	高等統計學、多變項分析統計法、共變結構分析、心理教育測驗與評量、試題作答理論
投稿日期：	2012/7/24
論文下載：
摘要(中文)：	近年來，效度證據的思維已更加廣泛而多元。本研究希冀透過「以人為中心」的對照組法，擴大納入教學現場教師的參與，作為2010年採用「以試題為中心」書籤標定法所進行四年級自然科標準設定結果的效度覆核。研究對象為全臺八位國小自然科教師及其任教班級的四年級學生233人。研究工具為自然科表現水準評定表和自然科單一題本標準化測驗，前者供教師逐一評定每位學生的自然科學習表現，屬於基礎以下、基礎、精熟、或者進階之其中一群；後者則是施測於教師所任教班級的學生以連結既有量尺。研究資料採用一般化部分給分模式將學生的二分類作答反應和教師判斷的多分類結果同時估計。結果發現，教師心中所認知最低通過標準，比書籤標定法所得的標準寬鬆，而最高通過標準，則較書籤標定法嚴格。此外，教師判斷整體命中率達52.36%，各表現標準的命中率分別達26.32%、57.14%、58.00%以及54.55%，提供一定程度的外部效度證據。最後提供數項建議供未來研究參考。
摘要(英文)：	The concept of validity evidence has become diverse and multifaceted in recent years. The purpose of the present study is to examine the external validity of science assessment standard setting for 4th grade, which was implemented with the bookmark method in 2010. This study uses “contrasting group method”, an examinee-centered method, to set performance standards. The participants were eight elementary school teachers and their 233 students. The instruments were classification sheet and a particular form of science test. Teachers were instructed to judge the performance of students based on the performance level descriptors and mark in the classification sheet with four levels (basic, basic, proficient, and advanced). In order to link the existing scale and teachers’ grading, the particular form of science test was administered to students. Generalized partial credit model was applied to estimate the dichotomous and polytomous data. The results revealed that the minimum standards of basic level set by contrasting group method was lower than that of the bookmark method, while the standard of advanced level by contrasting group method was higher than that of the bookmark method. Besides, the general hit rate was 52.36%, while hit rates of the performance classifications were 26.32%, 57.14%, 58.00%, and 54.55%. In the conclusion, suggestions for further studies are provided.
參考文獻：	吳宜芳、鄒慧英（2010）。試題呈現與回饋模式對Angoff標準設定結果一致性提升效益之比較研究。教育研究與發展期刊，6（4），47-80。吳宜芳、鄒慧英、林娟如（2010）。標準設定效度驗證之探究：以大型數學學習成就評量為例。測驗學刊，57（1），1-27。吳毓瑩、陳彥名、張郁雯、陳淑惠、何東憲、林俊吉（2009）。以常態混組模型討論書籤標準設定法對英語聽讀基本能力標準設定有效性之幅合證據。教育心理學報，41（1），69-90。吳裕益（1988）。九種通過分數設定方法之比較研究。初等教育學報，1，47-120。杜佳真、林世華（2007）。九年一貫課程數學領域能力指標「數與量」、「代數」主題軸第一、二階段表現標準適切性評估之研究。師大學報：教育類，52（1），63-85。黃俊傑（2009）。「攜手計畫課後扶助」執行評析及建議。北縣教育，67，69-72。謝進昌、謝名娟、林世華、林陳涌、陳清溪、謝佩蓉（2011）。大型資料庫國小四年級自然科學習成就評量標準設定結果之效度評估。教育科學研究期刊，56（1），1-32。 Bontempo, B. D., Marks, C. M., & Karabatsos, G. (1998, April). A meta-analytic assessment of empirical differences in standard setting procedures. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA. Brandon, P. R. (2002). Two versions of the contrasting-groups standard-setting method: A review. Measurement and Evaluation in Counseling and Development, 35(3), 167-181. Cizek, G. J., & Bunch, M. B. (2007). Standard setting: A guide to establishing and evaluating performance standards on tests. Thousand Oaks, CA: Sage. Cunningham, G. K. (2005). Must high stakes mean low quality? Some testing program implementation issues. In R. P. Phelps (Ed.). Defending standardized testing (pp. 123-146). Mahwah, NJ: Lawrence Erlbaum Associates. Educational Testing Service (2002). ETS standards for quality and fairness. Princeton, NJ: Author. Haertel, E. H. (2002). Standard setting as a participatory process: Implications for validation of standards-based accountability programs. Educational Measurement: Issues and Practice, 21(1), 16-22. doi: 10.1111/j.1745-3992.2002.tb00081.x Haladyna, T. M. (2002). Supporting documentation: Assuring more valid test score interpretations and uses. In G. Tindal. & T. M. Haladyna (Eds.). Large-scale assessment programs for all students: Validity, technical adequacy, and implementation (pp. 89-108). Mahwah, NJ: Lawrence Erlbaum Associates. Hansche, L. N. (1998). Handbook for the development of performance standards: Meeting the requirements of title I. Bethesda, MD: Frost Associate. Kane, M. (1994). Validating the performance standards associated with passing scores. Review of Educational Research, 64(3), 425-461. Kane, M. (1998). Choosing between examinee-centered and test-centered standard-setting methods. Educational Assessment, 5(3), 129-145. doi: 10.1207/s15326977ea0503_1 Livingston S. A., & Zieky, M. J. (1982). Passing scores: A manual for setting standards of performance on educational and occupational tests. Princeton, NJ: Educational Testing Service. Loomis, S. C., & Bourque, M. L. (2001). From tradition to innovation: Standard setting on the National Assessment of Educational Progress. In G. J. Cizek (Ed.). Setting performance standards (pp. 175-217). Mahwah, NJ: Lawrence Erlbaum Associates. Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159-176. doi: 10.1177/014662169201600206 Muraki, E. (1993). Information functions of the generalized partial credit model. Applied psychological measurement, 17(4), 351-363. Nasstrom, G., & Nystrom, P. (2008). A comparison of two different methods for setting performance standards for a test with constructed-response items. Practical Assessment, Research & Evaluation, 13(9), 1-12. Nichols, P., Twing, J., Mueller, C. D., & OMalley, K. (2010). Standard-setting methods as measurement processes. Educational Measurement: Issues and Practice, 29(1), 14-24. doi: 10.1111/j.1745-3992.2009.00166.x Nijlen, D. V., & Janssen, R. (2008). Modeling judgments in the Angoff and contrasting-groups method of standard setting. Journal of Educational Measurement, 45(1), 45-63. doi: 10.1111/j.1745-3984.2007.00051.x Sommers, S. (2012, September). The training and preparation of Angoff standard setting panelists: The role of group discussion and experience in determining panelist accuracy. Paper presented at the 1st International Conference on Standard-based Assessment, Research Center for Psychological and Educational Testing, Taipei. Tannenbaum, R. J. (2011). Setting standards on the Praxis Series Tests: A multistate approach. R&D Connections, 17, 1-9. Wolfe, E. W., & Smith, E. V. (2007a). Instrument development tools and activities for measure validation using Rasch models: Part I – instrument development tools. Journal of Applied Measurement, 8(1), 97-123. Wolfe, E. W., & Smith, E. V. (2007b). Instrument development tools and activities for measure validation using Rasch models: Part II – validation activities. Journal of Applied Measurement, 8(2), 294-234.