Missing dark matter found in NLS catalogue

You know how those physicists are worried about all that missing dark matter in the Universe.  Well, I think I have found it inside our main catalogue.  Award me that Nobel Prize for Physics now.

Here at NLS we are currently implementing a new resource discovery tool, Aquabrowser, that will help us bring our collections together so our customers can find all of the stuff in the Library with one search.   Once complete folks will be able to find all our books, all our digitised content, all our films, all the articles and information on our website, all our manuscripts and even stuff external to the Library like Times Digital Archive and other databases.  Pretty amazing, huh?  But it’s easier said/typed than done.  There is a small matter of mapping together the metadata from our different databases/collections.  I didn’t duck quickly enough and got that job.  Lucky me. 😉

But anyway, I have been focussing this week on the mapping for our main catalogue and while doing this I also did some analysis of the 4.2 millions bibliographic records in the database.   I was particularly interested in the Library’s usage of the MARC21 6xx fields which indicate the subject of an item.  As you probably know better than me, one shouldn’t rely on the title of a book to tell you what it is about (e.g. Richard Dawkins, The blind watchmaker) so librarians apply subject headings.  Given the importance of the subject for information retrieval I counted how many records in our main database had subject headings or some sort of classification and was surprised to discover that only about half of our records had some sort of subject metadata (be it LCSH, LC classification or Dewey).

Ooops, I thought, That means when someone subject searches our existing catalogue or uses the subject facet in our new Aquabrowser system they won’t retrieve half of the items! I described this to Graeme Forbes our Cataloguing and Metadata Services Manager as catalogue dark matter.  It’s alarming to me that there are materials in our Library that are hidden from our customers because the metadata doesn’t enable their retrieval.

Some of the reasons that we have incomplete records include:

  • brief records as a result of retroconversion
  • in days gone by computer storage was VERY expensive so some data was stripped including subject and classification
  • past cataloguing policies that didn’t prioritise subject retrieval
This past Monday Graeme and I attended a workshop for the UK/Irish legal deposit libraries about resource discovery and metadata down at The British Library.  I tentatively raised the issue of our dark matter with a colleague who has done analysis of the BL’s catalogue and was relieved to learn they have similar concerns about their data.  I was relieved because I was very worried that this was just a problem with our catalogue.
So what to do about it?  Well I dunno?  It would be great to upgrade the records but we’re talking approximately 2 million.  To do this we could:
  • search for new versions of the records in some of the cataloguing services like COPAC or OCLC
  • have our staff do it but they’re too busy trying to catalogue the mass of publications that come through the front door every week.  Our Collection Services staff wouldn’t like this approach because it’s them that have to retrieve the books from the shelves for the cataloguers and they wear enough shoe leather already fetching the books for our customers
  • or we could decide to do nothing and some how explain to our customers about the dark matter
If you know the answer to this problem, then do let us know.  I will nominate you for a Nobel Prize and buy you some chocolate which is probably more tempting.
Until the next time.
Gill
Advertisements

2 Responses

  1. Hi Gill,

    Good to know you’re back in one piece (are you??)!

    Yes, the dark matter is a good way to describe this – I remember quite a few of the policy changes along the way, and dropping LCSH for the majority of records was reluctantly agreed due to the enormous amount of material which never stops pouring in. What to do about it though? Unless someone finds a way of matching our records with fully subject-indexed records from elsewhere at the push of a button (!)or employing hundres of cataloguers then the matter will remain unresolved (and dark).

    Jan

    p.s. check out:
    http://nlsopublog.blogspot.com/

  2. DISCLAIMER. I haven’t thought this through. Even after considering it for a couple of days, I still don’t know if I’m being serious. Bearing that in mind…

    why not give the job to random strangers who happen to have computers connected to the internet?

    I’m thinking along the lines of the reCAPTCHA initiative that Carnegie Mellon University is using to get the aforementioned random strangers to do the donkey work on book digitisation projects (see ‘What is reCAPTCHA?’ at http://recaptcha.net/learnmore.html).

    I appreciate that being a national library means that the NLS would have more concerns than most about the quality of the data in its catalogue, but if your ‘metadata volunteers’ had to register, were limited to picking from an existing controlled vocabulary (rather than being able to create new authorities), and their work was occasionally (and randomly) assessed by NLS staffers, you never know your luck. If you called it a metadata project and announced it on professional sites and discussion lists, the vast majority of volunteers would be professional cataloguers anyway (who else would want to catalogue at home and at weekends?!) — so your copy of ‘Zen and the art of motorcycle maintenance’ definitely wouldn’t get filed beside the Haynes workshop manuals (as I once saw it on the shelves of a discount bookshop).

    And if you called it a Web 2.0 project, you’d probably even get a grant for doing it. What could go wrong?

    Just mention me in your speech to the academy in Stockholm. But send the chocolate first.

    A

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: