Saturday, December 13, 2008

PL 61/08: Industrial and digital statistics

Filed under: statistics — plinius @ 8:09 pm

1industrialStatistical data are produced by statistical systems.

Sunset industry …

This is true of all regular statistical variables required by modern organizations – including libraries. Data that are needed for normal operations must be produced by routinized systems.

Field testing

The field of library statistics is small and specialized. But it is important for people who work in and with libraries. Today,  our most urgent task for library statistics is to renew our statistical systems. Developing – or “inventing” – new variables and indicators must be part of that renewal. But field testing is essential. In technology, it is axiomatic that inventions will not make an impact unless they are tested and found useful under ordinary working conditions.  The same applies to statistical innovations.

Inventing new indicators is easy. Testing them with real data is laborious. Getting them accepted by organized statistical systems is back-breaking work. Our current systems are still heavily dependent on traditional ways of thinking and working.  They were developed in the late 20th century, long before the web made a deep impact on society. The systems reflect what I would call an industrial model of statistics.

The industrial model

Typically, these statistics are based on annual reporting of variables that are easy to gather from existing databases. Publication is also annual. The producers of statistics do not make the full data sets available for researchers to play with. They select the tables and the cross-tabulations we are allowed to study. The regular print publications are characterized by lots of numbers and very superficial analyses. They do not, I would say, educate their readers.

This is not the fault of the producers, however. It is a result of limited demand. Most librarians avoid statististics if they can. They do not want to  be educated. Thus, there is no pressure on statistical agencies from the outside to improve  their products. Repetition and complacency prevails.

The digital model

The web, of course, is changing everything: libraries, librarians, users and statistics. In a knowledge economy, statistical data must be treated as an economic rather than as an administrative resource. Statistical production should be judged by the same standards as other forms of knowledge production.  Libraries are knowledge factories rather than book media collections. Statistical systems should produce the data librarians need to sustain and develop their services – as well as the data politicians need to make decisions and evaluate policies.

This implies, I think, that most data should be collected at short intervals – hours, days, weeks, months – rather than years. They should, in most cases, be quality checked and published as soon as they have been collected. These are hardly sensitive data, so the full data sets should be made available in digital form for study by researchers and interested parties.

Library agencies should also do their level best to train librarians in practical, operational numeracy.

In library schools, most statistical courses are research- rather than practice-oriented. Students learn about normal distributions and T-tests and sampling errors, which most of them will never use. But they are seldom prepared for the statistical problems they will meet in real life: how to interpret official statistics, how to compare data from different libraries, how to construct and evaluate operational indicators.

Without a real demand for interesting data,  from people who are able to play and argue with numbers, library statistics will remain at the margin of our professional discussions. I am not saying that statistics is a VIS – a Very Important Subject. It is more like cartography. Without statistical data, we act blindly. With statistics, we may still disagree – but not about the landscape.

Early adopters

Library statistics can not be changed unless libraries change. Libraries can not be changed unless communities, users and educational institutions change. But this is hardly a problem. Our environment is changing before our eyes.

Statistical producers have a choice. They can join the early adopters – the organizations at the forefront of change; the moderate and careful reformists in the middle; or the stubborn defenders of tradition at the back. I write for the first group, but will be happy to discuss these questions with everybody.



Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at

%d bloggers like this: