Data, Knowledge & Life

Weiwei Cheng's blog

The ISI Case – A Critical View

leave a comment »

An issue of concern to computer scientists is the common tendency to use as reference for publications and citations the database of Thomson Scientific’s ISI Web of Science. This resource was devised for the natural sciences; while the corresponding community seems to be satisfied with its applicability there, it is inadequate for computer science.

The principal deficiency of ISI comes from its arbitrary classification of what is or is not worthy of being counted. The selection criteria are arbitrary and opaque. Most conferences are not listed; books are generally not listed; conversely, some references are included indiscriminately.

The results make any computer scientist cringe. For Niklaus Wirth, a famous computer scientist honored by the Turing Award (the highest honor in computer science, commonly referred to as the Nobel equivalent of the discipline) and known in particular for his design of Pascal, the ISI database lists a number of minor papers that happen to have appeared in indexed publications, but not his 1970 “Pascal User’s Manual and Report” (with Kathleen Jensen), published as a book and one of the best known references in all of computer science. Ask any computer scientist what is the most influential publication in the discipline, and most will cite Donald E. Knuth’s The Art of Computer Programming book series, which has acquired legend status; that reference does not figure in the ISI database. (On Google Scholar it gets over 15,000 citations, an astounding number.) Of the many articles that Knuth — also a Turing Award winner — has published, the three most frequently cited according to Google Scholar, each with about 1000 citations, do not even appear in the ISI records.

Evidence of how ISI collapses for computer science is “internal coverage”: the percentage of citations that cite a publication in the same database. Whereas ISI’s internal coverage exceeds 80% for physics or chemistry, it is only 38% for computer science.

An example of the arbitrariness of ISI criteria is Springer’s Lecture Notes in Computer Science (LNCS), which ISI for until recently classified as a journal, whereas it is simply a book series offering numerous conference proceedings and some monographs. Lumping all LNCS publications into a single journal category was inappropriate, especially since many high-quality conferences not published by LNCS are not listed. For example:

  • The International Conference on Software Engineering (ICSE), the top conference in a field that has its own special ISI category, is considered a premier publication venue by anyone in the field; it is not indexed by ISI.
  • Any software engineering workshop published in LNCS, the kind of venue where an author would typically try out an idea before it is ready for submission to ICSE, is indexed by ISI.

As another example, ISI indexes SIGPLAN Notices, a publication of the Programming Languages group of the ACM (one of the two major professional societies in computer science). SIGPLAN Notices is actually an unrefereed publication, used in its ordinary issues to publish drafts, notes, letters; but it devotes special issues to the proceedings of some of the most prestigious conferences such as POPL and PLDI. Unlike those to ICSE, contributions to these conferences will appear in ISI, but treated in the same way as an informal reader’s note in a regular issue.

The database has little understanding of what constitutes computer science. The 50 most cited references in computer science according to ISI include such entries as Chemometrics in food science (#13), from a journal called Chemometrics and Intelligent Laboratory Systems, a topic and a publication entirely alien to computer science. This is not just an isolated example; most of the entries on the list, even those which are related to computer science (usually from specialized areas rather than the core of the discipline) are not recognizable to a computer scientist as milestone contributions. The cruelest comparison is with the list of most cited computer science works on the CiteSeer site, devoted to computer science; while imperfect like any such selection, the CiteSeer lists many articles and books familiar to all computer scientists. It has not a single entry in common with the ISI list.

Merlet et al. note that the top-ranked ISI journal is 195th on CiteSeer, and the top CiteSeer journal is 26th for ISI. While some might be tempted to use this as a reason to dismiss rankings altogether, examination of the differences shows that they simply reflect how far off ISI is from the general understanding of computer scientists.

The ISI list of “highly cited researchers” reflects the database’s ignorance of computer science. Wirth, Parnas and Knuth, all iconic names in the field, do not appear. Of the ten Turing Award winners between 2000 and 2006, only one is listed (Ronald Rivest, the R of the RSA cryptographic algorithm), but not, for example, Adi Shamir (the S of RSA), another revered figure of theoretical computer science.

Although one might indeed expect ISI to give better results for Theory work, closer than Systems to mathematics (a long-established discipline), the difference is marginal because of the fundamental deficiencies in the process.

Since ISI indexing is based on an opaque process with no room for assessment or appeal of decisions, the situation is unlikely to improve.

Alternatives such as CiteSeer and Google Scholar are subject to criticism as well. While CiteSeer attempts to eliminate self-references, Google Scholar does not; neither project publishes its precise inclusion criteria. These deficiencies, however, are negligible when viewed against those of the ISI Web of Science:

In assessing publications and citations, the ISI Web of Science is inadequate for most areas of computer science and must not be used. Alternatives, imperfect but preferable, include Google Scholar, CiteSeer and (potentially) the ACM Digital Library.

Anyone in charge of research assessment should be aware that attempts to use ISI for computer science will cause massive opposition. Announcements of such plans have led some computer scientists to reject all measurement-based techniques, as in the examples cited above. This is an overreaction; but decision-makers and scientists from other fields must not try to impose on computer scientists a scheme that is demonstrably inapplicable to their discipline.

Beyond the specific deficiencies of ISI, all systematic studies of citation databases show wide variations between the results they yield, in particular for computer science. This has led some authors to suggest systematic reliance on several databases, a commendable practice that seems hard to impose in practice. Perhaps software will appear that performs this automatically. In the meantime it is again essential to remember the limitations of data quality in today’s databases, and avoid any career-affecting decision based on metric indicators whose validity has not be checked thoroughly.

Advertisements

Written by Weiwei

29/08/2009 在 15:14

发表在 转贴

发表评论

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / 更改 )

Twitter picture

You are commenting using your Twitter account. Log Out / 更改 )

Facebook photo

You are commenting using your Facebook account. Log Out / 更改 )

Google+ photo

You are commenting using your Google+ account. Log Out / 更改 )

Connecting to %s

%d 博主赞过: