| |
Assessment:
Test Reliability and Its Implications„Part 2
More on Reliability. In our last issue, we looked
at the concept of test reliability and correlation coefficients.
Generally, the longer the test and the more students involved, the
more meaningful is the coefficient. Why arent tests perfectly
reliable? Think of neural networks that contain particular knowledge
(such as content of a course or a unit of a course) as a rough surfacelike
a large area of Earths surface with its naturally rough topography.
A test is like taking sample measurements across this surface. If
two tests separately derive knowledge that are good representations
of the surface, then they should both correlate very highly with
one another. A problem, though, lies in the inherent roughness of
the three-dimensional, interconnected, interfolded, branching neural
networks produced through learning. This almost ensures that tests
are imperfect samples of the actual knowledge stored within these.
In itself, this illustrates why legitimate assessment of learning
requires multiple measuresnot just test scores or grades.
A test of a class samples not one brain surface, but
many, so one can recognize why writing a good test is a challenge.
With revision, tests can be optimized, but we faculty dont
have the luxury of tuning tests until fit for marketing. We must
write our routine tests for one-time use, without tuning based on
trial runs.
Individual test questions trigger responses from students to supply
information or to use information to engage in a higher level thinking
challenge, such as synthesis or evaluation. Different learners perceive
knowledge differently, and their brains retrieve it a bit differently.
If information is retrieved differently, an individual test question
may trigger a response in some students and not in others, even
though all may have the knowledge. In teaching, we know that to
come at material from as many ways as possible accommodates the
varied learning styles inherent in different students neural
wiring. Good test design must take student learning/recall diversity
into account, just as does good instructional design. A "good"
test will efficiently trigger responses from as many people as possible
that have the knowledge. Our next Nutshell Note will deal with ways
to write more reliable tests.
Implications. What instructional practices are most
effective in producing learning? How well are student ratings of
professors tied to students learning? Educational research
to provide answers to such questions involves comparisons of test
scores with varied practices or student ratings. Faculty often see
correlations such as r = 0.47 between student ratings and test performance
(Cohen, 1981, Review of Educ. Res., v. 51, pp. 281 - 309), or r
= 0.56 between test performance and the teachers' degree of preparing
and organizing their courses (Feldman, 1998, Teaching and Learning
in the College Classroom 2nd ed., pp. 391-414). Faculty who lack
awareness of test reliability are prone to judge these as "low
correlations" and erroneously presume that they result from
fogginess of student ratings or the lack of real importance of course
organization rather than part of the problem lying in the tests.
When we get an r-value such as 0.47 between student ratings and
test performance, part of the imprecision comes from imprecision
in ratings and part of the imprecision comes from the tests themselves.
In fact, measures of internal reliability of class tests may show
that the tests do not correlate much better with themselves than
they do with other good measures. Before we can use our tests to
do any comparisons with other measures, we need to quantitatively
deduce the reliability of our tests. When is a numerical relationship
good enough to be useful? Cashin (1988, Kansas State Univ., Idea
Paper n. 20) recognized: "Correlations between.20 and.49
are practically useful. Correlations between.50 and.70 are very
useful but they are rare when studying complex phenomenon."
The nature of test reliability helps us to understand why correlations
in educational research are not higher. Given the "wobble"
associated with tests, Cashins very useful values
are as good as we can expect to obtain by pairing another measure
with routine class exams.
Correlations work best when there is a range of scatter of both
sets of data under comparison, and there are enough data points
to make a correlation meaningful. Without such a range, some absurdities
can result. Consider for instance a situation in which tests reveal
that students learned little and student evaluations confirm as
much. Data like this are likely to condense into such little scatter
that aberrant points unduly influence the trend. Unrepresentative
correlations can result despite high agreement in the actual situation.
Consider the opposite situation in which nearly every student got
an "A" and all students agree the learning that took place
was high; unrepresentative correlation may again result for the
same reason. In such cases, other statistical tools are needed.
Assessing
along the Continuum of Students' Learning
nears
closing with 108 registrants as of 2/1/2005! Only a very few slots and books now remain.
Dr.
Peggy Maki
February 25, Friday, 8:30 A.M. - ~ 3:00 P.M.,
Red Lion Inn by I-15 Pocatello Creek Road Exit
Breakfast & Lunch provided
Early Registrants Receive Assessing for Learning: Building
a Sustainable Commitment Across the Institution, 2004, Stylus
Press, 204 p.
To register, email to nuhfed@isu.edu
and give your ISU mail box number
Beginning
with research on learning, this workshop will present collaborative
principles, practices, and strategies for assessing student
learning at the institution- and department levels as students
progress through their studies. The workshop will demonstrate
collaborative steps involved in assessing student learning.
Whos Peggy
Maki?
Higher
education consultant, Peggy L. Maki, Ph.D., specializes in assisting
institutions to integrate assessment of student learning into educational
practices, processes and structures. Her work also focuses on assessment
within the context of accreditors' expectations for institutional
effectiveness. She has recently been named to the Board of Contributors
of About Campus, Department Editor of Assessment for About Campus,
Assessment Field Editor at Stylus Publishing, LLC, and to the Advisory
Board of the Wabash Center for Critical Inquiry. She serves as a
faculty member in AAC&U's Institute on General Education; this
past summer she served as a faculty member in the Carnegie Foundation's
Integrated Learning Project. Beginning in the Summer, 2005, she
will be teaching graduate seminars focused on assessment at two
universities.
Formerly, Senior Scholar and Director
of Assessment at the American Association for Higher Education (AAHE),
she has served as Associate Director of the Commission on Institutions
of Higher Education, New England Association of Schools and Colleges,
Inc., New Englands regional accrediting body; Vice President,
Academic Dean, Dean of Faculty, and Professor of English, Bradford
College, MA; Chair of English, Theatre Arts, and Communication,
Associate Professor of English, and Dean of Continuing Education,
Arcadia University, PA. She is a recipient of a national teaching
award, the Lindback Award for Distinguished Teaching.
She has conducted over 300 workshops
and keynote addresses on assessment both in the U.S. and abroad,
including New Zealand, Hong Kong, Mexico, Greece, Bulgaria, British
Columbia, and Malaysia. Her articles on assessing student learning
have appeared in AAHEs Bulletin, AAHEs Inquiry and Action
series, About Campus, Assessment Update, Change Magazine, The Journal
of Academic Librarianship, NetResults, and Proceedings of the International
Conference on Teaching and Learning, held at the National University
of Singapore (keynote address). Her writing also includes articles,
chapters in books, and a book on the teaching of writing. Additionally
she conducts writing-across-the curriculum workshops that develop
and document student learning.
She is in the process of editing
a book on assessment practices at the doctoral level and developing
a workbook to accompany her recently published handbook on assessment:
Assessing for Learning: Building a Sustainable Commitment across
the Institution, published in June, 2004, by Stylus Publishing,
LLT, and AAHE.
|
|