What is a Controlled Vocabulary?

The term “Controlled Vocabulary” is not universally understood by all to mean the same thing. So that it can be used freely without misunderstanding, this paper defines the term as a “considered list of values, designed to improve searchability”. A set of “rules of thumb” are provided for use in the determination of whether a given set of values is a Controlled Vocabulary, and guidance is provided on populating one.

What it is:
At time of writing, Wikipedia provides a nice, pithy definition:

In library and information science controlled vocabulary is a carefully selected list of words and phrases, which are used to tag units of information (document or work) so that they may be more easily retrieved by a search.

The key point to impress is that the list is "carefully selected", which means that a person or group must deliberately decide on those words and phrases, prior to their use. A controlled vocabulary is not an automatic comprehensive set of all things of a given type, but a set of distinctions that are deemed useful.

This doesn't mean that a Controlled Vocabulary must be defined at a particular time, and remain eternally unchanged. Such an approach can hinder information retrieval as existing terms begin to be abused, as new concepts are added to the knowledge base.

What it does mean is that any changes to the list of terms must require deliberate consideration, and not merely be Business As Usual.

What are its characteristics?
There are certain features of Controlled Vocabularies (CVs) that are clear and uncontroversial:

• A field bound to a CV is constrained to contain only values from that CV
• A CV is a finite list, and not a pattern.

An analogue-only monitoring organisation considers TV and Radio to be distinct from one another. They start receiving IPTV, DVB and DAB. Should these new entries be added to the CV that distinguished TV & Radio? The answer depends on how that CV is used. If it is used to select a reception device, then they probably should be. If, on the other hand, it is used to denote whether a source is audio only, or audio & video, then they should not be added.

Wikipedia contributors, "Controlled vocabulary," Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Controlled_vocabulary&oldid=345857986 (accessed March 12, 2010).

