The Ambler – Page 21 – By Greg Vanderheiden, a high-school teacher

The Sincerest Form of Flattery

Post author By Greg
Post date March 23, 2012
No Comments on The Sincerest Form of Flattery

John Fell was the head of Christ Church College and also the Bishop of Oxford. Doctor Fell had the reputation of a severe schoolmaster, but legend has it that when a student about to be punished was able to offer the following jingle as an extemporaneous translation of an epigram by Martial, the doctor excused him from punishment:

I do not love thee, Dr. Fell,

The reason why I cannot tell,

But this I know and know full well,

I do not love thee, Dr. Fell.[1]

—Tom Brown

If imitation is the sincerest form of flattery, I hereby flatter Brown with my own jingle:

I do not love thee, Mr. Klein,

The reason is, I must opine,

An argument, of which the crux

Is “Value added learning sucks.”

The following bit was written by a Victorian satirist who cast a cold eye on the House of Hanover:

George the First was always reckoned

Vile, but viler George the Second.

And what mortal ever heard

Any good of George the Third?

But when from earth the Fourth descended,

Thank God! at last the Georges ended.

—Walter Savage Landor

Let me then, in imitation, cast a cold eye of my own:

Mr. Klein talks lots of bunk, and

More bunk comes from Mr. Duncan.

Are any folks on earth such prats

As these scholastic bureaucrats?

For answers we must dodge their wind and

Catch a plane up north to Finland.

[1] Hence Mr. Utterson’s referring to Mr. Hyde’s repulsiveness as “the old story of Dr. Fell” before settling on the explanation that Hyde has “a foul soul that…transpires through, and transfigures, its clay continent.” Does this make Stevenson the first literary figure to attach horror to a character by comparing him to an education administrator?

Uncategorized

Pink Slime Education

A couple of years ago I wrote about the increased use of an ammoniated bovine slurry called “processed beef” in schools’ cafeterias. That posting compared these inroads to the invasion of the same schools by junk education, a “product” as inimical to true education as “processed beef” is to taste and well being.

The latest news about “processed beef” is that the stuff is now known to its opponents as “pink slime,” a nickname given it by an Agriculture Department employee. That is inaccurate: it looks like Ortho^® snail and slug bait that has been bleached pink, and both “products” are formed and crumbly, not loose and viscous. Its proponents (the slime’s, not the bait’s) also urge in its favor that it is not toxic, always a reassuring quality in things fed to children.

The other news is that the Agriculture Department will soon allow schools subscribing to its “food” programs to buy other kinds of meat to serve their students. The degree of reassurance this news actually provides will depend on what those other kinds turn out to be, but in the meantime the good news is cheering.

Now, if only someone in the Education Department would issue rules barring junk education from those same schools, we would have a large improvement to match the small. Unfortunately, the DoE is committed to a model of education that in many respects is precisely analogous to the production of pink slime for eating.

First, all the good stuff in education is being cut out, as the nutritious beef is cut from the scraps thrown into the slime-processing machines. All that is left is little gobbets of knowledge like bits of low-grade flesh and gristle. The scrappiness is insured by multiple-choice testing, which works against synthesis and integration of knowledge. (Of course students can guess about a synthesis or interpretation when it is presented as one of four possibilities on a test, but in that case they still haven’t actually nourished their minds with a genuine synthesis.)

Second, the removal of fat from pink slime in the centrifuges where it is processed is equivalent in flavor-reduction to removing from education the tasty variety of classroom experiences in a rich curriculum, retaining only the lean leavings of “measurable behavioral objectives” that the testing can “capture.” That these leavings are not positively poisonous will be cold comfort to the kids who will have to endure a diet of them for twelve years.

Third, teachers are being turned by restrictive curricula and narrow results demanded into burger-flippers of the mind. Since many of them used to be good chefs, and popular ones, they are demoralized and disgusted by having to preside over pink slime, scorching griddles, and tanks of hot bubbling fat. Diane Ravitch reports that the year of experience with the highest population of teachers used to be the fifteenth year of teaching. Now it is the first. No wonder.

But one respect in which the slurry and delivery of fast foods does not resemble the Ed Biz these days is that when people on a steady diet of junk food suffer a deterioration of health, the waitresses are not arrested for causing grievous bodily harm. Unlike them, teachers are held responsible for whatever ill effects their education—or anything else!—may produce on their “customers'” learning.

So far is the Department of Education from admitting these massive shortcomings of NCLB and RAT that it is now trying to bring the benefits of junk education to colleges and universities. One hopes that action can be taken against the junkification of education as it has been against the slurry piped into school cafeterias, but regardless of hope, it is needed: the students fed this diet for twelve years will come away as walking damaged goods.

Uncategorized

Words Words Words

A few more entries from The Didact’s Dictionary:

acronym n.: 1. in good prose, an alphabet soup stain. 2. in jargon, an initial obfuscation. 3. in education branding (q.v.), repackaging by initials in order to make snappy what is essentially flaccid, as NCLB (No Child Left Behind), or to assert the truth of what is essentially false, as RAT (RAce to the Top).

behind n.: 1. (US education: NCLB) ahead.

GERM n.: [Global Education Reform Movement] a putative movement confined to the U.S. that unlike minuscule germs and genuine movements does not spread or move anywhere except by top-down inoculation and forced incubation.

value n.: 1. in general, relative worth 2. (non-standard) a student’s result on a standardized multiple-choice test in one of two subjects tested but of six subjects taught. Formerly but mistakenly called “achievement.”

added part.: in education, usually with value: augmented by teaching, completion of assigned study, parental encouragement, private tutoring, independent exploration, interest by peers, and inculturation; but arbitrarily deemed the responsibility of a teacher.

metrics n.: in baloney (see balonist), the means of posing as a judge of academic qualities by asking multiple-choice questions and scoring answers, the way no other judge works.

truth n.: what corresponds to reality. antonyms.: falsehood, bunk, rubbish, lie, baloney, b*******, value-added metrics.

And an observation:

The first PISA reading test results, released at about the time of NCLB’s passage, showed US schools behind Finland’s and those of thirteen other governments. Ten years after NCLB, the latest PISA reading test shows US schools behind those of sixteen governments, including three that were not in the first results. The results were worse for math and science. GERM is agitating for more of what led to this decade of success.

Uncategorized

Teacher Effectiveness Ratings: Let’s Play Gopher Bash!

Post author By Greg
Post date March 3, 2012
No Comments on Teacher Effectiveness Ratings: Let’s Play Gopher Bash!

If you’ve been to game arcades, you know that most games now involve screens, joysticks, buttons, and software. One charmingly primitive game that thumbs its nose at electronic sophistication is Gopher Bash. In this game the player takes a mallet and waits over a “field” where gophers appear at random in their holes, bashing them with the mallet when they do. (A variation allows the player to stamp them with the foot when they appear.)

Currently the most interesting thing about this game is not its intrinsic goofiness, though in an arcade game goofiness is an attractive quality. More fascinating is that it inadvertently displays the governing model for teacher evaluation under NCLB (No Child Left Behind or Neglected Children Lose Brains: take your pick) and RAT (RAce to the Top).

First, there are district, state and federal officials charged with evaluating teachers based on Value Added Modeling. They have the mallets. What makes the teachers resemble a field full of gophers randomly popping up (or pushed up) for bashing is the use of statistical “estimates of teacher effectiveness [that] are highly unstable[1]” to rate them. One study cited in the report I have just quoted found that a third of teachers rated in the top 20% of effectiveness one year found themselves in the bottom 40% the following year. Another study found “year-to-year correlations of estimated teacher quality [range] from only 0.2 to 0.4. This means that only about 4% to 16% of the variation in a teacher’s value-added ranking in one year can be predicted from his or her rating in the previous year.” Thus, even a teacher in the top 20% one year may be set up by a statistical fluke for a bashing the following year, and there is no way to predict the lucky winners.

A number of perverse consequences ensue from VMA-based evaluation. It unintentionally rigs the game against the teachers of the students most in need of special help, as studies cited in this report show. Teachers would respond rationally to this disincentive to teach them by running away from the bashing-field. They also run away from its arbitrary and capricious labeling. Sadly to me, who have always valued collaboration with my faculty colleagues, as readers of this blog know, this kind of rating system also appears to undercut cooperation within a faculty.

Bill Gates, whose foundation supports value-added modeling and teacher evaluation based on students’ test scores, said in a recent New York Times column that these numbers should not be placed in newspapers to shame teachers. Big deal. In New York the VAM numbers are a part of the public record, so potentially arbitrary humiliation is just a click away. More shameful to me than the Gopher Bash game is how many tourists in the garden of education have forgotten that teachers are not the gophers; they are the gardeners.

[1] “Problems with the Use of Student Test Scores to Evaluate Teachers,” pp. 12 – 13

Uncategorized

The Grand Academy of RAT

During his stay on the flying island of Laputa, Lemuel Gulliver visits the Grand Academy of Lagado to see projects that Jonathan Swift has imagined satirically. They include a project to build houses from the roof down and one to extract nourishment from excrement[1]. In a modern development proving that truth is more pungent than satire, the Grand Academy of RAT (RAce to the Top) has developed its own projects to amaze the visitor. Here are a couple of my favorite bits.

In Tennessee is a project to evaluate the success of physical education teachers by examining their students’ English and math test scores. Another seeks to have administrators evaluate teachers during five one-hour visits, each visit requiring ratings on 116 criteria, or one rating every 31 seconds, including time to watch the lesson. More can easily be found, for the RAT Academy is bursting at the seams.

Even Swift could not have satirized the reality that precedes these bizarre projects: applicants for RAT money, who need 2700 hours to fill out the application, discover that they must have as an “absolute priority” the intention to “measure” student “knowledge and skills” across a set of standards, including those “against which student achievement has been traditionally difficult to measure.” (Emphasis added.) If I were building this house starting at someplace lower than the roof, I would require as an “absolute priority” the assessment of students, and their teachers, against standards that can be measured—or, better, judged. That means not “measuring” teachers using formulas based on no standards, whose confidence interval spans 53 percentiles, and proceeding as if there were no confidence interval at all.

But even then we would not be starting construction with the basement. Diane Ravitch has often suggested beginning with maternity and early-childhood education. And in a recent articleshe reminds her readers that Finland begins building schools at the basement and has some of the best schools in the world to show for it. What does starting at the basement involve? She notes that Finland “rejects all of the ‘reforms’ currently popular in the United States, such as testing, charter schools, vouchers, merit pay, competition, and evaluating teachers in relation to the test scores of their students.”

And where, asks Ravitch, did Finnish schools get many of the ideas that they do use? From the United States—an earlier United States. One idea Finland did not get from the US, which seems like another basement feature to me, is insistence on the thorough preparation of teachers in highly competitive and demanding teacher training programs. (Finland’s accept one applicant in ten.) Having trained their teachers, the Finns then repose in them absolute confidence to do their job, allowing them to devise their own programs and tests.

The contrasting domestic reality, full of people trying to extract nourishment from excrement, seems to be solidifying, though the product remains nutrient-free. This does not keep people like an assistant commissioner for curriculum and instruction at the Tennessee Department of Education from saying that “the process is leading to rich conversations about instruction.” I can imagine how wonderfully rich they are, and how deeply satisfying. The minutes of them could probably fill a Bristol barrel.

[1] “[the ‘projector’] had a weekly allowance, from the society, of a vessel filled with human ordure, about the bigness of a Bristol barrel.”

Uncategorized

Coach of Many Colors

Readers of my posting on the flexible classroom (“The Class of a Thousand Spaces”) know that in a room where little is nailed down, much is possible. The other requirements of a successful flexibility are 1) a teacher whose approach to learning varies with the kind of learning to take place, 2) students who are ready to learn, in numbers that make flexibility feasible, 3) school administrators who are educational leaders (rather than, say, Ukrainian commissars or bean counters), and 4) things that work properly.

Teaching is, broadly speaking, of three main kinds: didactic instruction for imparting knowledge, coaching for development of skill, and Socratic teaching for encouraging the achievement of understanding. In the flexible classroom the flexible teacher will manage all three. Regrettably, most teachers’ focus tends to be entirely or mostly on didactic instruction.

It is also the focus of most educational software. Unlike the software, however, a practiced teacher can shift to coaching and Socratic instruction at need. Is there a good math teacher alive who does not insist that students show their work? That is because knowledge of the correct answer is only part of the learning involved. If a teacher sees a problem in the work, he or she can coach in the skill needed or try and establish an understanding in the student by asking particular questions based on the work and the student’s response.

And not just a math teacher. Robert Frost wrote a poem called “The Objection to Being Stepped On,” which opens “At the end of the row/ I stepped on the toe/ Of an unemployed hoe.” I invited my students to read it, at first without accompanying notes. One of them surprised me by saying that “this is a poem I can relate to.”

“Really?” I said. “Why is that?”

“Because it’s about a hoe!”

I suddenly understood, but pretended not to, and asked, “What interests you about a gardening tool? Do you enjoy gardening?”

He was puzzled, so I drew a hoe and explained its use. Though disappointed that the poem was not about a whore, he was already partly invested in it and ended up deciding that the rest of the poem made sense even if he could no longer relate to it. He eventually got the allusion to Isaiah and the wry, dry joke of a hoe as weaponry. I felt that he would not have got so far into it if someone had opened the discussion with a didactic statement (or internet screen) that “this is a poem about a man who hurts himself stepping on a garden tool.” He would just have gone into parrot mode and learned the knowledge he needed in order to mimic understanding.

One of the reasons I was able to speak to him as much and as often as I did was that the class had fewer than fifteen students in it. Such numbers allow an extent of coaching and Socratic questioning that becomes impossible in a larger class.

This would be true not just in a poetry class but also in a math class. A math teacher in a small class can ask students to show their work, presumably not just to verify that they have actually worked, but also to see how they are proceeding or going astray. The aim should be to discuss the work and advise how it might go better. This, too, is easier in a small class than in a large one. The I. B. math tests require students to show their work and give (or withhold) marks for work done well, poorly, or not at all, regardless of The Answer (though of course the correct answer gains marks too). It is hard to see how software could do the same thing, or how a math teacher with students in three figures could examine each one’s work thoroughly.

Much of what I am reporting on seems to lie behind the success of the Mooresville (N. C.) schools in improving the quality of their students’ work, but that is not what The New York Times focused on. True, the subheadline said “It’s Not Just About the Laptops,” but the tag for the article at the top left of the printed page gives away the true point of view, saying, “Mooresville School District, a Laptop Success Story.” I would say, by contrast, that the success of the Mooresville schools is due to their trying to structure teaching and learning in more flexible and productive ways, and not primarily to their adopting laptops.

I wish them well, but some elements of their plan look flawed. As usual with schools on a budget trying to adopt expensive IT gadgetry, something has to give, and at Mooresville it is class size, which has risen from 18 to 30. When they have to get their students to a level of achievement that embraces skill and understanding as well as knowledge, they are going to find it more difficult than they think if they have abandoned a class size that allows teachers to be coaches and questioners as well as drop-in advice-givers. If, as reported, they divide their attention according to who has lower scores, they are not meeting the needs of the higher-scoring students, who have their questions too.

The following example, though small, is emblematic. Earlier this year a student of mine surprised me by mentioning an author’s use of polysyndeton, not a word I usually associate with 11^th-grade criticism. Knowing him, I was sure that he hadn’t just idly copied the note from a source, so I pointed out that the example in question was a complex sentence whose ands did not all link grammatically parallel sentence elements. He understood me and made the needed change in his explanation. His problem is as deserving of attention as the problem of the boy attracted to hoes, and in a small class both problems will be attended to by the thorough teacher.

Mooresville will also have to find ways to deal with what the Times article generously or naïvely calls “growing pains.” I refer to connection and bandwidth problems as well as to the problem of students’ cutting and pasting or otherwise transferring “information” from one tab or window to another without real understanding. These are not “growing pains,” and the solution will not be to let things grow. School storerooms across the country are filled with unused stuff that was first described as having “growing pains.”

And their visitors will have to do something that the Times reporter has not yet done: they will have to see improvement as more than the right “balance between old tricks and new technology.” If studying the geography of a place means no longer having to make salt-and-flour maps, that is a real—but minuscule—advance. Far more important will be exchanging, where possible, the grid for more accommodating classroom models. More important yet will be replacing the monochrome teacher by a coach of many colors, aiming for a class size and classroom flexibility that allow the multifarious coach and questioner and his or her students to thrive in their joint enterprise. It sounds as if Kathryn Higgins, an English teacher referred to in the article, has found some ways to do so. Government officials will also have to start mandating in ways that don’t encourage well-meaning district administrators like Mark Edwards to look at widely publicized but superficial single high-stakes scores rather than exercise the subtlety they would probably like to use when evaluating their students. Reporters will also need to back away in their reporting from the old cliché that all conflicts in education boil down to a contest between The Future and The Old Farts’ Corner.

Uncategorized

Read ‘Em and Weep

The homegrown Writing Assessment I discussed in my last posting sought to peg students’ writing against grade-by-grade standards that we teachers felt we could reasonably expect students to meet. The standards started with those of the senior year, and the question we asked of each essay was Would this piece of writing be acceptable to a teacher of first-year students at a good U. S. university? From that standard down to the one governing 9^th-grade writing was a series of plausible steps.

At each grade we divided the range of possible writing into six different levels. Any essay that got a 4 or higher met the standard for that grade. (Essays getting 5 were significantly better than what was required, and essays getting 6 were dazzling.) Graduating seniors getting 4 could expect not to be massacred in freshman comp; those getting 3s were in some danger if they didn’t work hard. A 3 therefore meant “not quite at the mark.”

Each essay received a grade of 1 to 6 (or 0 for an evasion or no response) from two teachers, so the total grade was from 0 (rarely given) to 12 (also rarely given). The two teachers had to be within 1 mark of each other, a requirement not hard to impose. Our work as a department ensured that we would look at our students’ writing in more or less the same way: what does it do that good 12^th-grade writing ought to do?

And what characterized a senior essay we rated a 4? The student engaged with the question asked, on the whole successfully and thoughtfully. There was a balance between generalization and detail. The writing was unified and generally coherent. The student had a reasonably good grip on grammar and syntax. There was no waffle or baloney. The writing did not cloy. The diction was suitable to formal circumstances. Spelling was generally good. Having the result graded twice helped ratify the choice of marks (most of our composite grades were in even numbers) or suggested slight deviations from standard.

It is in light of our standard for giving a 4 that I read a startling article this week in The New York Times, which also discussed essays receiving a 4/6—in this case on the New York State Regents’ test. A quoted example began, “In life, ‘no two people regard the world in exactly the same way,’ as J. W. von Goethe says. Everyone sees and reacts to things in different ways. Even though they may see the world in similar ways, no two people’s views will ever be exactly the same. This statement is true since everyone sees things through different viewpoints.” Looked at using our standard, the extract shows no problems of grammar, syntax, or spelling; but then it sinks. Where is the successful engagement with the question? The balance of generalization and detail? Saying essentially the same thing three times is waffle, and the question-begging in the last sentence shows thoughtlessness. Yet this essay received a 4 from the Regents. I kept asking myself what the writer would need to do to get a 2.

Even that question was not answered in the article, which also showed short-answer paragraphs scored as 0, 1, or 2. The following sentence opened a paragraph getting a 1, presumably something like a 3 on the 6-point scale: “In the poem, the poets use of language was very depth into it.” If this is the opening sentence of a middling paragraph, what would open a bad one? Here are two sentences from an essay that received a 3 from the Regents: “Even though their is no physical conflict withen each other. Their are jealousy problems between each other that each one wish could have.”

I can’t imagine what “standard” such writing in a 12^th-grader “nearly” meets. There doesn’t seem to be much use in “standards-based” education with such standards, or “data-based” education with such data. The author of the Times article notes that 12^th-grade writers like this actually stood a decent chance of achieving the 65 required to pass the Regents’ test. To hear that the Chancellor of the Board of Regents wants to raise the passing score to 75 is thus not very comforting. I kept wondering how I could “teach” students for twelve years and have them “reach” the point of such an “achievement.”

Uncategorized

Grading Parties

The school where I had my flexible classroom gave us much room for movement in other things too. In general, the principle was to trust the teachers’ professional judgment, and in general the principle worked. One of the things the English department decided to do was to implement an in-house testing program using homemade “instruments,” i.e. tests, to size up the proficiency of our students.

We called them simply the Reading Assessment and the Writing Assessment. In one the students answered open-ended questions about a passage they had read; in the other, they wrote an essay on a prompt devised by the teachers. What interests me about these tests in retrospect, particularly in light of all the hubbub about high-stakes testing, was how little we thought along the lines laid out by the high-stakes testing people.

We had two purposes in giving the tests. One was to see whether, in general, we could identify problems and strengths in classes of students in order to shape what we taught and how we taught it. The other was to give us a chance to work together at grading the assessments, thereby achieving a broad consensus on marking.

It’s a good thing we didn’t give these tests for “high stakes.” Many years an insurgency developed among the students whereby they would throw their tests or otherwise rebel against taking them. Such insurgencies were rarely very big, but we usually ended up knowing what was happening and how they might affect results. But one year a particularly charismatic student managed to get a fairly widespread test-rebellion going. We found that in the class concerned, the tests were unusually poor, taken as a whole.

One set of ruined results did not have any dire consequences: we just marked them as we had planned and then discussed how we could head off test-rebelliousness in the future. But I wonder now what would have happened in a setting where such tests had high-stakes consequences for teachers’ raises or even their future employment. I have never heard of research being done on the problem of throwing tests even though such mischief is a credible threat to the integrity of test results.

The way we marked the tests was to call long meetings for such purposes, which we called “grading parties.” (Whoopee!) Before the parties we would devise standards for marking, and then at the beginning of the meeting we would go over them, assuring each other that we understood them or had modified them to take account of the group’s consensus about how to proceed. If results required it, we would revise our standards and re-apply them.

The department head (I was that lucky individual) would prepare the test papers so the students’ names could not be seen by the teachers, and I’d set them up so they would be graded by two teachers, neither of them that student’s teacher that year. After each paper was marked, I would compare grades to see whether they were close to each other. If there was a large discrepancy, I would have the two teachers confer about their marks. These conferences did not occur too often, but usually one teacher would end up modifying a mark. When neither one budged, I would step in. Since the test score was the sum of the two teachers’ grades, I would issue a revised score and give the two teachers a brief opinion backing my action.

Even without conferences, teachers began to have a sense of whether they were too lenient or too severe in their marking by seeing how their marks compared to those of their colleagues. We also benefited from ideas our colleagues had, which had not occurred to us. The parties were a good way of smoking out problems that led to severe or lenient results. I found that by the end of a day of marking (for that is at least how long it usually took) we were closer in our marking than we had been at the beginning of the exercise.

We could use the results to determine what characteristic problems or strengths in writing or reading each class had and (after the beginning-of-year test) teach to those problems or strengths. Each of us had a stronger sense of how the others marked, and we tended to view essays critically in similar ways.

All these seemed like worthwhile goals for testing. We never considered the possibility of using the tests to rate teachers. This is not just because we shied away from such things: we had a program developed by an ingenious consultancy[1] whereby we visited each others’ classrooms and lessons, offering helpful and frank suggestions for increasing or strengthening the learning that took place there.

The reason that we didn’t use students’ results to evaluate teachers was that we didn’t believe in “proxy” values. When you evaluate a student, you evaluate a student. When you evaluate a teacher, you evaluate a teacher. It might make sense to use students’ results as part of the evaluation of a teacher if the method of evaluation relied on subtlety and good judgment and took other things into account than just students’ scores on a single “instrument,” but fortunately we were not in the position of having consequential decisions about teachers follow on our marking of students.

[1] Looking for Learning, developed by Fieldwork Education Services

Uncategorized

Question Time

Much has been made of a recent study[1] that shows a correlation between the “effectiveness” of teachers as determined by the scoring of their students on “value-added metrics” and these students’ success in their later lives as determined by “markers.” This muchness put me in mind—again—of Flannery O’Connor’s remark that “[t]he devil of educationalism that possesses us is the kind that can be cast out only by prayer and fasting.” I am not so sanguine as O’Connor: even prayer and fasting don’t often seem to work! I keep wondering what could possess whole communities of people to be stunned by a complex statistical study embodying years of data on millions of students when it concludes that children with good teachers do better than children with bad. One of the devils in the legion seems to be rather dim, but I will try to give the devil his due.

The original report is impressive in its thoroughness and the care with which its authors make and qualify their claims. They note, for example, that teachers in the study were not “incentivized based on test scores,” thereby skirting the effect of cheating, teaching to tests, and other “distortions in teacher behavior” that make the basis of value-addition different from what it would be in a population whose members had been “incentivized”—that is, in the real world of Campbell’s Law. There is no guarantee that results like this study’s would be similar to those in a district whose teachers were looking over their shoulders at the Value-added Reaper as he made his progress through their ranks. The twofold problem is that the use of “value-added metrics” encourages teaching to tests (the most-purchased books in the New York schools are books of preparation for tests), and there is evidence in research as well as the educational experience of the human race that teachers who teach to tests get worse results than teachers who don’t.

They caution that some elements of the value-added equation require “observing teachers over many school years” and may not apply in a “high stakes environment with multitasking and imperfect monitoring”—that is, precisely, the kind of environment in which hasty “consequential decisions” will be made on the basis of imperfect applications of the equation over the short term.

They point out as a justification for their aggregate numbers that “observable characteristics are sufficiently rich so that any remaining unobserved heterogeneity is balanced across teachers,” but those who want to use “value-added metrics” to make consequential decisions will be applying the equation to particular individuals without correction for “unobserved heterogeneity.”

They note that their study did not include the effect of peers and of parental investment in value-addition. While everyone agrees that the teacher’s effect on what students learn is pronounced, this seems like a significant omission that could have serious consequences for the teachers whose students’ peers and parents had a significant effect on the learning for which the teacher is held exclusively reponsible.

The authors state that the study’s assumptions “rule out the possibility that teacher quality fluctuates across years.” Can this be? Raise your hand if your quality was as good in your first year of teaching as in your tenth.

In addition to what the authors say in qualification and limitation of their results, I have a few questions. They say that “value added is difficult to predict based on teacher observables.” Do the people who want to use value-added metrics as the basis for personnel decisions want to go a step farther and assert that there is nothing observable that a teacher can actually learn or plan to do or avoid that will make a difference in how she or he scores? This seems like a bizarre position for someone who believes in life-long learning.

I want to understand in non-mathematical terms how “academic aptitude” is factored into the equation so that teachers will not be “penalized” for taking classes of difficult or refractory students. It seems to be a single number (η_i) in the equation, but how is it derived?

I would like to know how many years’ value-added ratings they think a teacher should receive before the ratings can be said to reflect his or her actual performance, and I would like to understand the basis for this determination. It is one thing to say that we have some aggregate statistics that show teachers in general have certain effects on their students in the long run, and a rather different thing to say that these statistics can reliably rate individual teachers in one or two goes. This is particularly true given that the authors themselves say some elements of the value-added equation require “observing teachers over many school years.”

Having asked my questions I now make a couple of observations. One of the study’s authors, according to The New York Times, says that value-added metrics should be used even though “mistakes will be made” and “despite the uncertainty and disruption involved.” It is disturbing to see someone so fastidious in the drawing of conclusions become so sweeping and remorseless in applying them, particularly when the study itself has just spoken to the need to “weigh the cost of errors in personnel decisions against the mean benefit from improving teacher value-added.”

The problem with “mean benefits” is that they have particular consequences. The authors have said that they think it would be more cost-effective to fire ineffective teachers (even mistakenly ineffective ones) than to give bonuses to effective ones. I keep wondering whether this kind of decision-making will be ethos-effective. I keep wondering who is going to be attracted to a profession governed by such principles and assumptions as those that lie behind value-added systems. “Drifters and misfits,” as Hofstadter called them? The authors of the study note that no observable teacher behavior correlates to value addition, so I wonder who will join a profession in which it cannot be said with confidence what he needs to do in order to be successful.

The moral and intellectual world in which the discernment of quality was a matter of finesse or connoisseurship and in which reward and reprobation follow particular deeds or ways of doing things is the same one in which we could say without a quantitative rationalization that the students of good teachers do better than the students of bad. That world is also a place where both teachers and administrators take their duties seriously, including the duty to counsel and correct when needed and to accept counsel and correction when deserved or needed.

It might be worth ending with a note on the stereotype that people who are against value-added “measurement” are unionists, educational bureaucrats, or people with tenure to lose in a change of system. In my twenty-five years as a teacher I have never worked within a tenure-granting system. I have never been in a union shop, nor have I been a member of a teachers’ union. I have never held an administrative position in education except that of Department Head. I have never worked in a teachers’ college. If I am against the kind of practice discussed in this posting, it is not because I have a hidden interest. It is because it seems wrong. I mean both wrong-headed and culpable.

[1] “The Long-term Impacts of Teachers: Teacher Value-added and Student Outcomes in Adulthood” by Raj Chetty, John N. Friedman, and Johah E. Rockoff of Harvard. http://www.nber.org/papers/w17699.pdf

Uncategorized

(Brick and Mortar) Schools

Post author By Greg
Post date January 7, 2012
No Comments on (Brick and Mortar) Schools

It’s time to stop using the expression “brick-and-mortar school” as if there is any other kind. I mean in particular to oppose the terms “virtual” and “on-line” being applied to schools, for such network-connections-and-data-bases don’t act as schools except in a threadbare and impoverished sense.

Or are they even as good as threadbare? The standard of “progress” mandated by No Child Left Behind, described generously as “very crude” by Professor Gary Miron of Western Michigan University, would qualify as threadbare. And yet applying even that standard, a recently released study co-authored by Professor Miron showed that on-line “schools” did worse at “improving” their students than “brick-and-mortar” schools did. (It also showed that for-profit “schools” did worse than non-profit schools.)

The late sociologist James S. Coleman did a large study reported in his 1987 book Public and Private High Schools. In it he found that the single strongest correlate of effectiveness in ordinary high-school education was that the schools in which the effective education took place were functional communities. A network is not a community, though some communities do function partially through networks. There is certainly nothing communitarian in an arbitrarily collected group of young people sitting by mandate in front of screens. Nor do such groups bear any resemblance to the ad hoc groupings (not communities) sometimes found on social networks, whose members make a choice to share some limited interest or focus. That is one reason we distinguish between communities and interest groups or single-interest constituencies; but we should also distinguish between networks and any of those other collections, for a network need have none of the above.

For something to be “virtual” in the traditional sense, it must operate under some kind of power or agency (a “virtue”) that has an essential and sufficient effect even though the thing in question does not take its usual form. What essential and sufficient agency is at work in a “virtual” school? Surely the answer can’t be “instruction”! Of the three kinds of learning—knowledge, skill, and understanding—educational software can hope to deliver only knowledge. Skill requires coaching, and the last time I looked, almost all coaches were genuine human beings, for how could they not be in order to adapt themselves to their students’ needs? And the promotion of understanding requires Socratic questioning, which software cannot provide, for something like the reason that it cannot play a good game of gō.[1]

When I think of software providing understanding, it puts me in mind of the electronic confessional in THX 1138. The Donald Pleasance character receives “understanding” from his “confessor,” but the movie invites us not to congratulate the effectiveness of future cybernetics but to mourn the threadbareness of a life to which that “confessor” could offer anything significant.

In the most famous example of Socratic questioning, Socrates himself hears his acquaintance Thrasymachus assert that justice is the interest of the stronger party. Socrates asks him a series of questions whose answers lead Thrasymachus to understand that justice cannot possibly be what he has just claimed. Socrates holds him to each answer he gives by asking one more question about that answer till Thrasymachus grasps fully why he was in error to make that assertion. This is not something that can be programmed because—in real life, if not in a dialogue planned by Plato—the programmer cannot know what a respondent’s next answer will be to an open-ended question, and it is these open-ended questions that force the respondent to step out of the box of slogans and memorized lines that he brought to the discussion. Until then, “justice” might as well be the montillation of traxoline.

Good teachers understand all of this, which is why some teachers in Idaho (and elsewhere) are protesting the mandating of online “schooling.” One of them, Ms. Ann Rosenbaum, sounds like a formidable person and a dedicated teacher, and one not to shrink from a struggle. It is a pity that she must come up against such sorry adversaries as Idaho’s governor Otter and its schools superintendent Luna. Luna falls back on vacuous clichés like “schools of the 21^st century,” while Otter says that if Ms. Rosenbaum “only has an abacus in her hand, she is missing the boat.” Of course, that is not the only thing that Ms. Rosenbaum has in her hand, as the article shows. (Thankfully, it doesn’t show what Governor Otter has in his hand.)

But she doesn’t need anything in her hand when she is using the Socratic method: “engag[ing] students with questions” and “using each answer to prompt the next” question. Of all the questions Socrates asks Thrasymachus, only the first one could appear on question-and-answer software. Ms. Rosenbaum doesn’t want to give up a rich line of questioning for haring around fields of knowledge with questions asked arbitrarily, which is basically what question-and-answer software does.

A “virtual school” is not a community, nor can it be one. It does not have a sufficiency of action by virtue of which it offers a complete education. It will provide coaching for skill at about the same time that country clubs can replace the pro shop by the machine shop. It cannot impart or ratify understanding. Why are we calling it a “school,” and why are we moving towards such things? I am afraid the answers to these questions have little or nothing to do with education. While we are turning up the answers, let us refrain from “saying the thing that is not,” as Jonathan Swift called it[2];for an on-line “school” is not a school.

[1] This was written before Alphago, but I still find it unlikely that educational software will be able to respond in a genuine way to students’ comments any time soon. In the meantime, human beings should do just fine as teachers, and barkers of “educational” software should wait till it does before touting it. (17 March 2016)

[2] While Gulliver was in the land of the Houynhnhms