期刊專區/ Periodical Section

試題呈現與回饋模式對Angoff標準設定結果一致姓提升效益之比較研究

Evaluating the Utility of Different Item Presentation and Feedback Approaches with the Modified Angoff Method

吳宜芳;鄒慧英
Yi-Fang Wu;Hueying Tzou

所屬期刊：	第6卷第4期「測驗與評量」主編：國立臺灣師範大學教育心理與輔導學系林世華
系統編號：	vol023_02
主題：	測驗與評量
出版年份：	2010
作者：	吳宜芳;鄒慧英
作者(英文)：	Yi-Fang Wu;Hueying Tzou
論文名稱：	試題呈現與回饋模式對Angoff標準設定結果一致姓提升效益之比較研究
論文名稱(英文)：	Evaluating the Utility of Different Item Presentation and Feedback Approaches with the Modified Angoff Method
共同作者：
最高學歷：
校院名稱：
系所名稱：
語文別：
論文頁數：	34
中文關鍵字：	Angoff法;Reckase表;試題預先分類;標準設定
英文關鍵字：	Angoff method;item-grouping;Reckase charts;standard setting
服務單位：	美國愛荷華大學教育測驗統計研究所博士生;國立臺南大學測驗統計所教授
稿件字數：	17581
作者專長：	測驗與評量
投稿日期：	2010/10/20
論文下載：
摘要(中文)：	在標準設定的眾多方法中，Angoff法及其相關變形、延伸與修正程序等，實為教育實景中相當普及的標準設定流程。然而，執行Angoff標準設定方法的設定者在概念化最低能力受試者、估計其答題概率時，面臨相當大的認知挑戰。試題特徵（如：試題難度）對設定者間或設定者內一致性的影響，可能影響最後產出標準的效度。基於此，本研究試圖以實徵P值排序回饋、Reckase表回饋與試題呈現分類與否等做法融入修正Angoff法的標準設定程序，以促進設定結果的一致性，並從中比較前述作法融入設定程序之優劣。本研究係為測驗結束後所進行之標準設定研究，屬於事後做決定型，研究中探究不同回饋模式及試題是否分類呈現對標準設定結果之影響，藉以比較二種作法的優劣，此為本研究之獨特性所在。其次，透過這二種修正作法，期能使設定者對於試題難度有較佳的察覺，進而改善設定間或設定者內一致性，提高設定結果的一致性，並對標準之效度有所助益，是為本研究在功能性之貢獻。
摘要(英文)：	Numerous standard setting methods have been developed to assist panels in estimating the performance of the borderline examinees. Among them, the Angoff method is one of the most popular judgmental standard setting procedures. Its extensions, modifications, and variations are often applied in practice. In standard setting, panelists hold an important role, especially in the judgmental methods such as the Angoff method and its variations. The ability of panelists to accurately estimate the borderline examinees’ performance is to some extent subjected to item difficulty. Once the accuracy is questioned, the validity of the performance standard would be damaged. Therefore, a variety of procedures and several types of feedback have been developed to reduce inconsistency among panelists or within a single panelist. To compare different procedures embedded in the modified Angoff standard setting method for establishing cutoff scores on a large-scale achievement assessment, we designed two standard setting activities, integrating different procedures to help panelists make more accurate estimates. Two sets of data from a national achievement assessment in mathematics in Taiwan were used in the standard setting activities. Each set contained 104 operational multiplechoice items used to measure students’ grade-level math ability. Twelve panelists participated in the 4th grade standard setting activity and the 6th grade panel consisted of 14 panelists. They were all math educators and some had prior experiences in the modified Angoff standard setting procedures. The standard setting procedures included two factors, each of which involved two conditions: test items with/without item-grouping in advance; different types of feedback, such as feedback with empirical p-values and feedback with IRT calibration/Reckase charts (Reckase, 1998, 2001). We presented a generalizability analysis design to examine the improvement of consistency for different above mentioned procedures. Item effect, item difficulty effect (both within difficulty level and between levels) and panelist effect were of interest. First, the percentage of variance components of item effect increased consistently from Round 1 to Round 3, while the percentage of variance components of panelist effect decreased as the setting round passes. Panelists’ consistency was raised; in addition, relatively more variability of panelists was eliminated in the procedure of feedback with Reckase charts. Secondly, with/without item-grouping, panelists could make similar estimates of item performance toward items with similar difficulty as the setting rounds passes. Finally, item-grouping integrated into feedback with Reckase charts having the best improvement of intra-judge consistency, since we observed that under this condition, the estimates of the root mean square error were the smallest and the estimates of generalizability coefficients and intraclass correlation coefficients (ICCs) were the highest. Panelists are capable of distinguishing hard and easy items; however, with the help of item-group by difficulty and feedback with Reckase charts, the variability induced by item difficulty which has an impact on panelists’ consistency, has been decreased as much as possible. This finding, undoubtedly, is beneficial in terms of defending the validity of standard.
參考文獻：	吳裕益(1986)。標準參照測驗通過分數設定方法之研究。國立政治大學教育研究所博士論文(未出版)。吳裕益(1988)。標準參照測驗通過分數設定方法之研究。測驗年刊，35，159-166。鄭明長、余民寧(1994)。各種通過分數設定方法之比較。測驗年刊，41，19-40。 Allen, N. L., Jenkins, F., Kulick, E., & Zelenak, C. A. (1997). Technical report of the NAEP 1996 state assessment program in mathematics. Washington, DC: National Center for Education Statistics. American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association. Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp.508-600). Washington, DC: American Council on Education. Berk, R. A. (1986). A consumers guide to setting performance standards on criterion referenced tests. Review of Educational Measurement, 56(1), 137-172. Brandon, P. R. (2004). Conclusions about Frequently Studied Modified Angoff Standard-Setting Topics. Applied Measurement in Education, 17(1), 59-88. Buckendahl, C. W., Smith, R. W., Impara, J. C., & Plake, B. S.(2002). A comparison of Angoff and Bookmark standard setting methods. Journal of Educational Measurement, 39(3), 253-263. Cizek, G. J. (2001). Conjectures on the rise and call of standard setting: An introduction to context and practice. In G. J. Cizek (Ed.), Setting performance standards: Concepts, methods, and perspectives (pp.1-17). Mahwah, NJ: Lawrence Erlbaum Associates. Cizek, G. J. (2006). Standard setting. In S. M. Downing, & T. M. Haladyna (Eds.), Handbook of test development (pp.225-258). Mahwah, NJ: Lawrence Erlbaum Associates. Cizek, G. J., Bunch, M. B., & Koons, H. (2004). Setting performance standards: Contemporary methods. Educational Measurement: Issues and Practice, 23(4), 31-50. Cizek, G. J., & Bunch, M. B. (2007). Standard setting—A guide to establishing and evaluating performance standards on tests. Thousand Oaks, CA: Sage. Clauser, B. E., Swanson, D. B., & Harik, P. (2002). Multivariate generalizability analysis of the impact of training and examinee performance information on judgments made in an Angoffstyle standard-setting procedure. Journal of Educational Measurement, 39(4), 269-290. Ferdous, A. A., & Plake, B. S. (2005). Understanding the factors that influence decisions of panelists in a standard-setting study. Applied Measurement in Education, 18(3), 257-267. Goodwin, L. D. (1999). Relations between observed item difficulty levels and Angoff　minimum passing levels for a group of borderline examinees. Applied Measurement in Education, 12, 13-28. Hambleton, R. K. (2001). Setting performance standards on educational assessments and criteria for evaluating the process1, 2. In G. J. Cizek (Ed.), Setting performance standards: Concepts, methods, and perspectives (pp.89-116). Mahwah, NJ: Lawrence Erlbaum Associates. Hambleton, R. K., & Pitoniak, M. J. (2006). Setting performance standards. In R. L. Brennan (Ed.), Educational measurement, (4th ed., pp. 433-470). Washington, DC: American Council on Education. Impara, J. C., & Plake, B. S. (1997). Standard setting: An alternative approach. Journal of Educational Measurement, 34(4), 353-366. Jaeger, R. M. (1995). Setting performance standards through two-stage judgmental policy capturing. Applied Measurement in Education, 8(1), 15-40. Kane, M. (1987). On the use of IRT models with judgmental standard setting procedures. Journal of Educational Measurement, 24(4), 333-345. Kane, M. (1994). Validating the performance standards associated with passing scores. Review of Educational Research, 64(3), 425-461. Lorge, I., & Kruglov, L. K. (1953). The improvement of the estimates of test difficulty. Educational and Psychological Measurement, 13, 34-46. MacCann, R. G., & Stanley, G. (2006, January). The use of Rasch modeling to improve standard setting. Practical Assessment, Research & Evaluation, 11(2). Retrieved from http://pareonline. net/pdf/v11n2.pdf McLaughlin, D. H. (1993). Validity of the 1992 NAEP achievement-level setting process. In L. Shepard, R. Glaser, R. Linn, & G. Bohrnstedt (Eds.), Setting performance standards for student achievement tests: Background studies (pp.81-122). Stanford, CA: National Academy of Education. Matter, J. D. (2000). Investigation of the validity of the Angoff standard setting procedure for multiple-choice items. (Unpublished doctoral dissertation). University of Massachusetts, Amherst, MA. Maurer, T. J., Alexander, R. A., Callahan, C. M., Bailey, J. J., & Dambrot, F. H. (1991). Methodological and psychometric issues in setting cutoff scores using the Angoff method. Personnel Psychology, 44, 235-262. National Assessment Governing Board (2006). Writing framework and specifications for the 2007 National Assessment of Educational Progress. Washington, DC: National Assessment Governing Board. Pitoniak, M. J. (2003). Standard setting methods for complex licensure examinations (Unpublished doctoral dissertation). University of Massachusetts, Amherst, MA. Plake, B. S., & Impara, J. C. (2001). Ability of panelists to estimate item performance for a target group of candidates: an issue in judgmental standard setting. Educational Assessment, 7(3), 87-97. Plake, B. S., & Melican, G. J. (1989). Effects of item context on intrajudge consistency of expert judgments via the Nedelsky standard setting method. Educational and Psychological Measurement, 49(1), 45-51. Plake, B. S., Melican, G. J., & Mills, C. N. (1991). Factors influencing intrajudge consistency during standard-setting. Educational Measurement: Issue and Practice, 10(2), 15-25. Raymond, M. R., & Reid, J. B. (2001). Who made thee a judge? Selecting and training participants for standard setting. In G. J. Cizek (Ed.), Setting Performance Standards: Concepts, Methods, and Perspectives (pp.119-157). Mahwah, NJ: Lawrance Erlbaum Associates. Reckase, M. D. (1998). Setting standards to be consistent with an IRT item calibration. Iowa City,IA: ACT. Reckase, M. D. (2000). The ACT/NAGB standard setting process: How “modified” does it have to be before it is no longer a modified-Angoff process? Paper presented at the Annual Meeting of the American Educational Research Association, New Orleans, L. A.(ED442825) Reckase, M. D. (2001). Innovative methods for helping standard-setting participants to perform their task: The role of feedback regarding consistency, accuracy and impact. In G. J. Cizek (Ed.), Setting Performance Standards: Concepts, Methods, and Perspectives (pp. 159-173). Mahwah, NJ: Lawrance Erlbaum Associates. Reckase, M. D. (2006). Some criteria for evaluating the functioning of standard-setting methods with application to bookmark and modified Angoff methods. Educational Measurement: Issues and Practice, 25(2), 4-18. Schraw, G., & Roedel, T. D. (1994). Test difficulty and judgment bias. Memory and Cognition,22(1), 63-69. Shepard, L. A. (1995). Implications for standard setting of the National Academy of Education evaluation of National Assessment of Educational Progress achievement levels. Proceedings from the Joint Conference on Standard Setting for Large-Scale Assessments. Washington, D.C.: National Assessment Governing Board and National Center for Education Statistics. Shepard, L., Glaser, R., Linn, R., & Bohrnstedt, G. (1993). Setting performance standards for student achievement tests. Stanford, CA: National Academy of Education. Sireci, S. G., & Biskin, B. H. (1992). A survey of national professional licensure examination programs. CLEAR Exam Review, 3, 21 25. Smith, R. L., & Smith, J. K. (1988). Differential use of item information by judges using Angoff and Nedelsky procedures. Journal of Educational Measurement, 25(4), 259-274. Taube, K. T. (1997). The incorporation of empirical item difficulty data into the Angoff standardsetting procedure. Evaluation & Health Professions, 20, 479-498. van der Linden, W. J. (1982). A latent trait method for determining intrajudge inconsistency in the Angoff and Nedelsky techniques of standard setting. Journal of Educational Measurement,19(4), 295-308. van der Linden, W. J. (1986). A latent trait method for determining intrajudge inconsistency in the Angoff and Nedelsky techniques of standard setting (Addendum). Journal of Educational Measurement, 23(3), 265-266. Verhoeven, B. H., van der Steeg, A. F. W., Scherpbier, A. F. F. A., Muijtjens, A. M. M., Verwijnen, & van der Vleuten, C. P. M. (1999). Reliability and credibility of an Angoff standard setting procedure in progress testing using recent graduates as judges. Medical Education, 33, 832-837. Wuensch, K. L. (2003). Inter-rater agreement. Retrieved from http://core.ecu.edu/psyc/wuenschk/docs30/InterRater.doc