« Mood Bath Light | Home | Bird life »

April 3, 2006

Amazon Text Stats — Find out how many big words before you start a book


Noam Cohen wrote in the March 19 New York Times about this Amazon feature, which lets you find out before you buy a book only to be baffled, befuddled and bewildered by the bounty of big words and endless sentences exactly how difficult it is, using a variety of indexes of clarity and ease of reading.

Most interesting and the stuff of endless time–wasting for those few of you prone to such behavior.

Here's the article.

    Book, How Do I Love Thee? Let Me Count the Words

    Who would compare "The Story of Babar" to the prize-winning novel "Everything Is Illuminated"?

    Who would call James Joyce's "Ulysses," the bane of many an undergrad, a work for a seventh grader?

    With the aid of software at Amazon.com known as Text Stats, anyone can make such comparisons, which are based on the crudest sort of computer analysis of a book: how many big words there are, and how long the sentences run.

    Such simple statistical scrutiny has been around for decades — used to determine a book's appropriateness for a certain grade level, among other things.

    But software like Amazon's automates the process, and the Internet lets anyone see the results.

    To what end? ask some literary scholars, who see such techniques as little more than superficial gimmicks.

    But others say they are a tool to gain insight into the authorship of and influences on a text, whether the work of Bob Dylan, Shakespeare or your average high school student.

    When Amazon gets the right from a publisher to let readers "search inside" a book, Text Stats tallies the average length of a sentence and amasses little piles for each word used. (Or big piles, as in the case of the King James Bible, for example, where the count for "loin" is 1,548; "behold," 1,426; and "lord" 7,082.)

    The software then ranks a book for clarity and ease of reading on a variety of indexes.

    For example, "The Story of Babar" has a Flesch-Kincaid Index score of 6.1 (sixth-grade level), the same as "Everything Is Illuminated" by Jonathan Safran Foer.

    Their "fogginess" quotients, an index similar to Flesch-Kincaid, are very close, too, though the Foer book is slightly less clear — 8 percent of its words are "complex," compared with 7 percent for "Babar."

    Text Stats also produces concordances, lists of the 100 most-used words in a book.

    It is no surprise that the ratings made by computers, and the connections between books that they reveal, are often bizarre, since the software is not concerned with meaning and context and is unaffected by subjective factors like author reputation.

    "It's machine reading; it is the kind of reading no one person can do," said Ben Marcus, director of the graduate fiction program at Columbia University and a novelist whose works are not accessible to Amazon's computers.

    "I think it is really fascinating, anything that takes us closer to a text, that makes us aware that it is put together to create an illusion."

    The flaw is obvious, too.

    "The computer doesn't recognize how sentences relate to each other," he said.

    "Gertrude Stein or Beckett may write in elementary sentences, but they take such huge leaps between them."

    But that thickheadedness can be useful, some scholars say.

    In "Alice in Wonderland," for example, a statistical study can "place this text against a large collection of 19th-century fiction to see which other works it resembles on a stylistic basis — what genre does it fit best, judging, say, from patterns of use of very common words?" Hugh Craig, who teaches at the University of Newcastle in Australia, wrote in an e-mail message.

    "But it would be essential to do the reading and analysis in the normal way as well, to see what it is that makes the patterns."

    Richard Abrams of the University of Southern Maine said that he could get the big picture of a writer from statistical analysis.

    In preparing for a seminar on Mr. Dylan's lyrics, he said, he found it useful to consult a concordance of the 10 most used words in the lyrics, which included, he said, "babe" and "dark."

    "For someone who had Dylan on the brain, there was an absolute sense of familiarity," he said. "You knew you were looking at a Dylan favorite word list, it showed Dylan as a Romantic."

    Still, statistical analysis like this can bring to mind the reported critique of Mozart by the Austrian emperor Josef II: "too many notes."

    Helen Vendler, the Shakespeare critic at Harvard, had not heard of Text Stats but speculated that "people will get bored by it — especially if it insults your intelligence by saying 'Ulysses' is at seventh-grade level."

    Likewise, she said a "concordance is not particularly interesting reading."

    Amazon says it likes Text Stats because it keeps readers at the site longer comparing and contrasting books.

    "It is definitely a feature that we view as having a 'sticky' aspect," said Brian Williams, the senior product manager in charge of the Text Stats functions at Amazon.

    Mr. Williams said he had heard complaints about the rating of "Ulysses" but explained that Text Stats was "just one tool."

    He said he had read blog postings from authors discussing their score, always tongue and cheek.

    "It should be tongue and cheek," he said.

April 3, 2006 at 04:01 PM | Permalink


TrackBack URL for this entry:

Listed below are links to weblogs that reference Amazon Text Stats — Find out how many big words before you start a book:

» USA TODAY Archives Search from USA Search Engine Optimization
Up-to-the-minute business & financial news, current market information, feature stories, personal finance and investment tools to help investors choose and ... [Read More]

Tracked on May 30, 2006 7:44:16 PM


You should not reproduce the entirety of this guy's article on your own blog!

Posted by: cm | Feb 10, 2009 7:17:17 PM

The comments to this entry are closed.