Metadata
In order to make a collection of articles, it is necessary to know their titles, authors and publication information. This may not be particularly controversial, but the effort needed to provide this extra information (metadata in the language of information scientists) can seem like a particular burden when faced with the prospect iof entering a backlog of publication information. Of course, a backlog of publications is exactly what a new user of an eprint archive faces, and so precisely those users who are feeling the most daunted about using an unfamiliar piece of software to perform a new task have a particular burden facing them.
Some institutions may choose to provide support for users in this task (see under mediation), but whether or not this is the case, it is important to understand what information is required and why.
First of all, consider the case of a simple Web search engine such as Google. When you make your files available on the Web, you don't need to fill out Google form describing its contents. Instead, Google automatically discovers the existence of the file and automatically indexes all the words in the document. This makes the process of entering new information on the Web much easier, but means that the search for a document is quite naive. By contrast, a search for an academic paper is usually performed against the author, title and year of publication. With Google, it is impossible to tell whether a word is part of the name of the author of the article, of one of the articles in the bibliography, or just a random part of the text. The string "1994" could be the year that the article was published or the page number.
The purpose of accurate metadata is to make searches accurate, so that you can be confident that the article that you are provided is the article that you are looking for, not just one that sounds like it. EPrints tries to minimise the information you are required to enter (and different sites will have different requirements for different purposes) but the following information is required:
Author names
Author names are perhaps the most important piece of metadata about an article because the surname of the first author is one of the most significant distinguishing pieces of information about a paper. As a self-archiver, it is likely that the name is either yours or that of one of your colleagues. Although it sounds patronising to emphasise it, please make sure you know how to spell this name! In particular, please be consistent with
- initials — how many names do you have? Which initials do you record on your papers? Make sure you use the same initials in the paper and the metadata.
- prefixes — are you known as "de Souza" or "deSouza"
- do you use diacritical marks or an ASCII sanitized spelling?
Paper title
If possible, cut and paste the title of the paper directly from its contents. There may be some probelematic issues regarding the formatting (for example, sub- and super-scripts in a chemistry article, or italic formatting for a mathematical expression). It is important to realise that the metadata is a database record - its purpose is searching, not printing. The best course of action is to provide text that reflects the meaning of the title, without trying to duplicate its appearance. However it may be common in some communities to use explict markup (e.g. physicists would naturallly put the TeX makup for mathematics; it is unlikely that philosophers would use explicit RTF codes).
Publication status
This one piece of information marks the difference between an article that has successfully been through the peer-review process and those that have not (or not yet). Please ensure that this information is added after a paper has been accepted!
Year, issue number, page number
These three numbers are all significant in distinguishing citations of similar sounding papers. The drawback with them is that they are not known until well after the paper has been accepted for publication and hence a long time after the eprint has been deposited. Please make sure that you return to the eprint record and add this information when it becomes available. If your institution uses the eprint archive to drive its publication auditing or to automatically produce staff CVs, then this information will be crucial (and will save you the effort of having to provide it in other contexts).




