Idaho State University Idaho State University Home PageISU Site Feedback FormISU Web Site SearchISU Website Index
spacer
spacer
spacer


  
Idaho State University's One-page
Newsletter for Teaching Excellence

Volume 13, Number 2, February, 2005
Center for Teaching and Learning
Museum 434 Campus Box 8010
Pocatello, ID 83209-8010

 
Phone (208)282-4703
FAX (208)282-5361
nuhfed@isu.edu

 

 
  

Assessment: Test Reliability and Its Implications„Part 2


More on Reliability. In our last issue, we looked at the concept of test reliability and correlation coefficients. Generally, the longer the test and the more students involved, the more meaningful is the coefficient. Why aren’t tests perfectly reliable? Think of neural networks that contain particular knowledge (such as content of a course or a unit of a course) as a rough surface—like a large area of Earth’s surface with its naturally rough topography. A test is like taking sample measurements across this surface. If two tests separately derive knowledge that are good representations of the surface, then they should both correlate very highly with one another. A problem, though, lies in the inherent roughness of the three-dimensional, interconnected, interfolded, branching neural networks produced through learning. This almost ensures that tests are imperfect samples of the actual knowledge stored within these. In itself, this illustrates why legitimate assessment of learning requires multiple measures—not just test scores or grades. A test of a class samples not one brain “surface,” but many, so one can recognize why writing a good test is a challenge. With revision, tests can be optimized, but we faculty don’t have the luxury of tuning tests until fit for marketing. We must write our routine tests for one-time use, without tuning based on trial runs.


Individual test questions trigger responses from students to supply information or to use information to engage in a higher level thinking challenge, such as synthesis or evaluation. Different learners perceive knowledge differently, and their brains retrieve it a bit differently. If information is retrieved differently, an individual test question may trigger a response in some students and not in others, even though all may have the knowledge. In teaching, we know that to come at material from as many ways as possible accommodates the varied learning styles inherent in different students’ neural wiring. Good test design must take student learning/recall diversity into account, just as does good instructional design. A "good" test will efficiently trigger responses from as many people as possible that have the knowledge. Our next Nutshell Note will deal with ways to write more reliable tests.


Implications. What instructional practices are most effective in producing learning? How well are student ratings of professors tied to students’ learning? Educational research to provide answers to such questions involves comparisons of test scores with varied practices or student ratings. Faculty often see correlations such as r = 0.47 between student ratings and test performance (Cohen, 1981, Review of Educ. Res., v. 51, pp. 281 - 309), or r = 0.56 between test performance and the teachers' degree of preparing and organizing their courses (Feldman, 1998, Teaching and Learning in the College Classroom 2nd ed., pp. 391-414). Faculty who lack awareness of test reliability are prone to judge these as "low correlations" and erroneously presume that they result from fogginess of student ratings or the lack of real importance of course organization rather than part of the problem lying in the tests. When we get an r-value such as 0.47 between student ratings and test performance, part of the imprecision comes from imprecision in ratings and part of the imprecision comes from the tests themselves. In fact, measures of internal reliability of class tests may show that the tests do not correlate much better with themselves than they do with other good measures. Before we can use our tests to do any comparisons with other measures, we need to quantitatively deduce the reliability of our tests. When is a numerical relationship good enough to be useful? Cashin (1988, Kansas State Univ., Idea Paper n. 20) recognized: "Correlations between.20 and.49 are practically useful. Correlations between.50 and.70 are very useful but they are rare when studying complex phenomenon." The nature of test reliability helps us to understand why correlations in educational research are not higher. Given the "wobble" associated with tests, Cashin’s “very useful” values are as good as we can expect to obtain by pairing another measure with routine class exams.


Correlations work best when there is a range of scatter of both sets of data under comparison, and there are enough data points to make a correlation meaningful. Without such a range, some absurdities can result. Consider for instance a situation in which tests reveal that students learned little and student evaluations confirm as much. Data like this are likely to condense into such little scatter that aberrant points unduly influence the trend. Unrepresentative correlations can result despite high agreement in the actual situation. Consider the opposite situation in which nearly every student got an "A" and all students agree the learning that took place was high; unrepresentative correlation may again result for the same reason. In such cases, other statistical tools are needed.

Assessing along the Continuum of Students' Learning

nears closing with 108 registrants as of 2/1/2005! Only a very few slots and books now remain.

Dr. Peggy Maki
February 25, Friday, 8:30 A.M. - ~ 3:00 P.M., Red Lion Inn by I-15 Pocatello Creek Road Exit
Breakfast & Lunch provided
Early Registrants Receive Assessing for Learning: Building a Sustainable Commitment Across the Institution, 2004, Stylus Press, 204 p.

To register, email to nuhfed@isu.edu and give your ISU mail box number

Beginning with research on learning, this workshop will present collaborative principles, practices, and strategies for assessing student learning at the institution- and department levels as students progress through their studies. The workshop will demonstrate collaborative steps involved in assessing student learning.

 

 

Who’s Peggy Maki?

Higher education consultant, Peggy L. Maki, Ph.D., specializes in assisting institutions to integrate assessment of student learning into educational practices, processes and structures. Her work also focuses on assessment within the context of accreditors' expectations for institutional effectiveness. She has recently been named to the Board of Contributors of About Campus, Department Editor of Assessment for About Campus, Assessment Field Editor at Stylus Publishing, LLC, and to the Advisory Board of the Wabash Center for Critical Inquiry. She serves as a faculty member in AAC&U's Institute on General Education; this past summer she served as a faculty member in the Carnegie Foundation's Integrated Learning Project. Beginning in the Summer, 2005, she will be teaching graduate seminars focused on assessment at two universities.

Formerly, Senior Scholar and Director of Assessment at the American Association for Higher Education (AAHE), she has served as Associate Director of the Commission on Institutions of Higher Education, New England Association of Schools and Colleges, Inc., New England’s regional accrediting body; Vice President, Academic Dean, Dean of Faculty, and Professor of English, Bradford College, MA; Chair of English, Theatre Arts, and Communication, Associate Professor of English, and Dean of Continuing Education, Arcadia University, PA. She is a recipient of a national teaching award, the Lindback Award for Distinguished Teaching.

She has conducted over 300 workshops and keynote addresses on assessment both in the U.S. and abroad, including New Zealand, Hong Kong, Mexico, Greece, Bulgaria, British Columbia, and Malaysia. Her articles on assessing student learning have appeared in AAHE’s Bulletin, AAHE’s Inquiry and Action series, About Campus, Assessment Update, Change Magazine, The Journal of Academic Librarianship, NetResults, and Proceedings of the International Conference on Teaching and Learning, held at the National University of Singapore (keynote address). Her writing also includes articles, chapters in books, and a book on the teaching of writing. Additionally she conducts writing-across-the curriculum workshops that develop and document student learning.

She is in the process of editing a book on assessment practices at the doctoral level and developing a workbook to accompany her recently published handbook on assessment: Assessing for Learning: Building a Sustainable Commitment across the Institution, published in June, 2004, by Stylus Publishing, LLT, and AAHE.

 

 
       
      
   Center for Teaching and Learning  
      
   ISU home page  
         
   text-only alternative