Friday, April 13, 2007

To curve or not to curve

Not king of the mountain

Dr. Free-Ride has a post over at Adventures in Ethics and Science about the whole business of grading. Shall we embrace the relativism of the curve or prefer the absolutism of the raw score? The curve places a student in the context of the performance of other students, while raw scores may be construed to measure the degree of an individual's mastery of subject matter. My own experience in three decades of assigning grades is a model of inconstancy.

I used to base the grades in all of my math courses on the normal distribution. So-called “standard” scores were very appealing. By converting all the raw scores into standardized measures relative to mean scores, I could instantly tell whether a student was above average (positive score) or below average (negative score). You obviously cannot do that with raw scores, where an 82% might be a spectacular result in a class whose average is 62%, but rather poor if the average were 85%. Standard scores are also a nice way to compare scores from different exams, since standard scores (or “z scores”) are all in terms of standard deviations. What could be better? Or more standardized!

One fly in the ointment was pretty obvious. Students aren't comfortable with standard scores. (Not that they necessarily have a good grasp of raw percentages, but those have the advantage of being more familiar to them.) I recall particularly one student whose score put him smack in the middle of the class's grade distribution. Cocking his head at my explanatory sketch of the normal distribution's bell curve, he grinned and said, “Wow! I'm at the peak! I'm king of the mountain!” Uh, no. How I wish I could have been certain he was kidding.

My students never seemed to be very comfortable with grades assigned according to the normal distribution. Sure, it was nice when class averages on the exams were low and a score like 62% could still be considered a C, but there was much grumbling in the ranks whenever the averages were high and a score like 85% might be reckoned a C. Grading on the curve also tended to put me in something of a bind: Even a low-performing class ended up with a few A's because being just halfway competent in such a context made one into an outlier. Must I give an A to someone whose work really looks like that of a B student?

Naturally I reserved to myself the right to make the final decision on each individual grade, but last-minute tweaking was an open invitation to toss out the system and replace it with ad hoc judgments. On one side I seemed to have the Scylla of a grading straitjacket and on the other side the Charybdis of arbitrariness. It was a puzzle.

The students who weren't there

The last thing I tried before abandoning the curve system was a cohort of phantom students. In my gradebook I had four nonexistent pupils. One was entered as earning 90% on every assignment and exam. The others were similarly consistent: one at 80%, another at 70%, and the last at 60%. While their scores were not figured into the class averages and standard deviations, I used the class statistics to convert the mythical students' imaginary grades into standard scores. When I ranked my students according to the composite scores on which their grades were to be based, my nonexistent students provided convenient markers for partitioning the class into the conventional grade categories. Although I did not consider myself necessarily bound by the results (I am the decider!), I found that this device produced results that I always found persuasive. In a weak class, for example, the putative 90% student would run away from the pack and sit alone at the top of the distribution, making it clear that I shouldn't feel obligated to parcel out any A's.

This was a big step toward using raw percentage scores, which I finally began to use without benefit of translation into z scores. Sure, scores from different exams weren't as easy to compare to each other, but it wasn't as though I had to do that particularly often (if ever). Just as depending on raw scores freed me from the problem of using a system that wanted to assign A's to the least bad of a weak lot, it also made it easier to be generous to a strong class. Where z scores might argue to give low grades to the weakest students in a class, raw scores might show me quite persuasively that everyone in the class deserved a passing grade.

After all, what are grades supposed to do? Tell us whether you're better or worse than some classmate? To a degree, yes, but even more important is providing a capsule report of how well you know the subject matter. If everyone masters the material, then everyone must have good grades. If no one does, then no one gets a good grade.

Professor Procrustes assigns grades

That business about potentially using the curve to assign bad grades to good students has a personal angle for me. In my junior college, I found myself one semester with only two classmates in a physics class. Attrition had taken its toll (and it had been a small class to start with). The physics instructor, a cheerful and hapless fellow who stumbled through the material as best he could, was a firm believer in curve grading. As callow as we were, the students in the class doubted that a normal distribution would be a very good fit for our tiny cohort of three. Our teacher nevertheless bent himself to the task and proposed giving out one A, one B, and one C. (I think we were supposed to be pleased that he did not choose to give one A, one C, and one F.)

I was not unduly distressed by the ultimate outcome of the instructor's decision on grading, because I had cheerfully aced all the exams that semester and was sitting alone at the top of the grade distribution. My two classmates, however, were in a statistical dead heat, nestled side-by-side in the low 80s. They protested vociferously (well, one of them certainly did) that it was unfair to artificially distinguish between their two grades. I weighed in and offered my own opinion that there was not a dime's worth of difference between their scores. (I hope I didn't actually say it that way, but I don't recall.) The teacher was up for tenure that year and skittish at the thought of complaint letters going into his file (or so I surmise). He eventually saw sweet reason and gave each of my classmates a B, as I think they richly deserved.

While there are clearly features of curve grading that still appeal to me, I now think that grading is too rough a business to attempt to fit it into such an ideal model. My students are more comfortable with grades based on raw percentages (even if I do confuse them by using weighted averages of the various components of the course), and I find it simpler to compute target scores for future exams for them. (I just wish they understood those target scores better and strove more diligently to achieve them.)


Anonymous said...

I remember my diff equations prof telling us, as he passed back tests with one failing grade after another, that he was worried at first that he might be testing too hard (sadly, these scores were representative of our performance the whole semester), but that the latest test was identical to one he had given to a class back in the sixties (or seventies... I forget, he was pretty old) and those smarties had averaged a B, overall. He derived great relief from that, after all, he was the same teacher, it was the same test so we must be dumber. Honestly, I think he was right.

Adiposis Dolorosa said...

I taught for a year at Chabot Community College, Basic math, Basic and Intermediate Algebra. I used the “clumping” system I learned from my second semester calculus teacher. In every class of sufficient size (in my limited experience even very small classes, about 12 or so) quiz and exam grades almost always come in clumps, usually with clear separation between the clumps. These usually corresponded very closely with A-level, B-level, etc. work. E.g. with the following set of scores I would put the grade cutoffs as shown.

35 42 47 56 57 61 67 68 68 70 71 71 74 75 76 79 80 80 82 84 85 85 89 90 93 96 100
F 35-47
D 56-61
C 67-76
B 79-85
A 89-100

My students seemed to feel that this was fair, and I had no complaints about it. It also was helpful when a quiz turned out to be harder than I expected it to be – all the grades would be depressed, but the clumps remained. Anybody else use it?

Anonymous said...

I think we were supposed to be pleased that he did not choose to give one A, one C, and one F.

I know of a teacher who did just that: three students, graded on the curve. One got an A, one got a C, and one got an F. He didn't back down on it, either. (The exam was optional; the other N-3 students elected to do a term project instead.)

Zipi said...

I am firmly against curve grading. I could understand it maybe in a class with 300 students, but never in a class with 30 students. Any instructors who has taught for a few semesters knows that sometimes the students are smarter and work harder, and sometimes just the opposite. A good instructor is able to notice if a test was too hard and correct the grades accordingly, which should happen rarely. It is far more common to see variations in the ability of the class.

I see a theme going on here, though. In my home country of Spain (where grading is usually absolute) it is socially considered more important to be good. In America (where curve grading is so popular) it is socially considered more important to be better than others. Americans do not want to be losers, but Spaniards do not want to be a failure. Another example: in Spain, the equivalent of the valedictorian title for US high schools is a distinction that is awarded to those who perform very excellently according to some standards. On a given year, this can go to one, several, or no students.

Max Polun said...

The one problem with raw scores is that the test is of reasonable difficulty to get a 90% or so for an A, 80% for a B, etc. What I'm really addressing is that in at least collegiate level physics (and my understanding is most other sciences and engineering work this way too) the test are made very hard so that an average grade would be something like a 60% in a typical class. The result is that in these fields always curve the tests.

I do agree that raw grades are better for less technical fields as well as for lower (high school and below) levels of these.

Anonymous said...

I teach high school and community college chemistry. Students always ask about a curve, which I just abhor. I tell them that I dont belive in curves because there is nothing to strive for, encourages mediocrity, and makes people want to beat up the kid who "blew the curve." They usually laugh and let it go.

At the JC, the "curve" is built into the course. An A drops to an 85%, a B to 75, a C all the way down to a 55%, and a D to a 45% (I didnt devise this system, but all introductory chem teachers use it). When students ask me to curve a test I tell them that they already have a generous one built into the class.

Seriously, with a curve at the lower levels of education (introductory courses, high school, etc.), arent we really just rewarding students for not meeting standards?

Curves and extra credit make my skin crawl.

emeraldimp said...

It's been my experience that maxpolun, above, is correct (having just graduated with a BA in physics). One of my professors delighted in telling the story of the time when his class was given, as a final, several of the unsolved problems in physics... the best score was 5%.