怎样创建你自己的词汇表

让我们建立一个非常简单的词汇来描述方尖碑obelisks (高,四边,狭窄的锥形纪念碑,以金字塔形状结尾),以便埃及艳后Cleopatra凯撒Caesar 可以分享有关其个人收藏信息。

从简单的英语到图形表示

首先,用英语说出我们想要输入的词汇:

  • An obelisk is owned by a person.
  • An obelisk is built by a sculptor.
  • An obelisk has a height, which is a numerical value.

这里我们用中文表述一下上面的词汇:

  • 方尖碑 归于 一个所有。
  • 方尖碑雕刻家 建造
  • 方尖碑高度是一个数值。

这些句子中突出显示的元素将成为我们词汇表中的“术语”。我们可以识别两种类型的术语:我们谈论的事物(例如方尖碑雕刻家)及其属性(例如,按高度建造)。让我们用图形表示,将事物放在气泡中,将属性放在正方形中:

The obelisk vocabulary

从简单的英语到RDF

使用IRI识别所有内容

RDF是用于构建词汇表以在整个Web(又称为关联数据)中使用的语言。在RDF中,所有内容都由IRI(它们只是标准的Web URI(通用资源标识符))进行标识,但稍微现代一些,因为它们可以包含来自更为国际化的字符集中的字符(例如’α’, ‘δ’, 或 ‘ό’))。

首先,我们需要一个IRI来表示(或识别)我们的新词汇(正如我们所说,RDF中的所有内容都由IRI来识别!),例如http://w3id.org/obelisk/。从那里开始,让我们现在更新一下简单的英语示例: RDF is the language used to build vocabularies for use across the Web (aka Linked Data). In RDF, everything is identified by IRIs (which are simply standard web URIs (Universal Resource Identifiers), but just a little bit more modern in that they can contain characters from a more Internationalised set of characters (e.g. ‘α’, ‘δ’, or ‘ό’) - for more information, see Wikipedia).

As we can see, identifiers quickly become unpleasant to read when they are IRIs, so RDF introduces the notion of prefixes (a simple concept borrowed from XML namespaces). From now on we’ll use the prefix obelisk: to stand in for our vocabulary identifier http://w3id.org/obelisk/, which means our vocabulary now looks like:

Things, and Properties of Things

From the above we can see that we want to describe both ‘Things’ (e.g. Obelisks and Sculptors), and the ‘Properties’ of those things (e.g. their height, or who owns them). RDF allows us to explicitly distinguish between these by referring to ‘things’ as Classes, and ‘properties’ as, well, Properties!

Defining Classes of Things

In RDF, the general things that we can talk about are called Classes. Therefore everything that went into a bubble in our diagram above is a Class, so we could add the following to our vocabulary:

If we look at these sentences, they are structured exactly like the ones from the rest of our vocabulary. Let us underline the important bits in the same way:

What we need now are IRIs for the “is a” property and the “Class” Class. Fortunately, these are defined in the RDF and RDFS vocabularies: “is a” is defined by rdf:type, and “class” by rdfs:Class. Therefore, we can now write:

@prefix obelisk: <http://w3id.org/obelisk/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

obelisk:Obelisk rdf:type rdfs:Class .
obelisk:Person rdf:type rdfs:Class .
obelisk:Sculptor rdf:type rdfs:Class .

And congratulations, you’ve just created your first snippet of valid RDF! This particular RDF syntax is called Turtle, there are many other standardized syntaxes, but we don’t need to cover them in this tutorial.

Defining properties of things

The properties of things in RDF are called properties (how convenient). Therefore, as we did for Classes, we might write:

We already know that is a is identified by the IRI rdf:type, and property is identified by rdf:Property so we can now go ahead and change that into:

Which leads to our vocabulary looking like this:

@prefix obelisk: <http://w3id.org/obelisk/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

obelisk:Obelisk rdf:type rdfs:Class .
obelisk:Person rdf:type rdfs:Class .
obelisk:Sculptor rdf:type rdfs:Class .

obelisk:ownedBy rdf:type rdf:Property .
obelisk:builtBy rdf:type rdf:Property .
obelisk:height rdf:type rdf:Property .

Adding information for humans

Using labels and comments

So far we have created identifiers that are primarily intended for machines (although it is certainly not recommended, the IRIs themselves do not need to be meaningful to humans at all). For example, the following would technically be an equivalent vocabulary:

@prefix o: <http://w3id.org/obelisk/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

o:C001 rdf:type rdfs:Class .
o:C002 rdf:type rdfs:Class .

o:p001 rdf:type rdf:Property .
o:p002 rdf:type rdf:Property .
o:p003 rdf:type rdf:Property .

Even if we don’t want our vocabulary to look like this, the point is that it’s really useful to also provide human-readable descriptions of the terms in our vocabularies. To do so we’ll use the properties rdfs:label to add a human-readable label for the term identified by the IRI, and rdfs:comment to add a few sentences describing what is meant by the term in the context we use it. This could lead to something like this:

@prefix obelisk: <http://w3id.org/obelisk/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

obelisk:Obelisk rdf:type rdfs:Class ;
    # A label for readability...
    rdfs:label "Obelisk" ;
    # ... and a more descriptive comment for a fuller explanation of this 'thing'.
    rdfs:comment "An obelisk is a four-sided pillar with a pyramid-shaped top." .

obelisk:Sculptor rdf:type rdfs:Class ;
    rdfs:label rdfs:label "Sculptor" ;
    rdfs:comment "An artist who sculpts obelisks." .

obelisk:ownedBy rdf:type rdf:Property ;
    rdfs:label "owned by" ;
    rdfs:comment "Relationship between an obelisk and the person who owns it, which is typically the person who ordered it, or to whom it was offered." .

obelisk:builtBy rdf:type rdf:Property ;
    rdfs:label "built by" ;
    rdfs:comment "Relationship between an obelisk and the person who built it." .

obelisk:height rdf:type rdf:Property ;
    rdfs:label "height" ;
    # Note: so far we didn't specify any units for the height (we'll fix this properly later), but we can however provide a hint in the comment.
    rdfs:comment "The distance from the ground to the highest point of the obelisk, in meters." .

Please note that we are using a shortcut provided by the Turtle syntax to avoid repeating the thing that we talk about when adding multiple properties to it (e.g. obelisk:ownedBy in the next snippet):

  • The long version:
    obelisk:ownedBy rdf:type rdf:Property .
    obelisk:ownedBy rdfs:label "owned by" .
    obelisk:ownedBy rdfs:comment "Relationship between an obelisk and the person who owns it, which is typically the person who ordered it, or to whom it was offered.".
    
  • The shortcut:
    obelisk:ownedBy rdf:type rdf:Property ;
      # We removed the repetitions of obelisk:ownedBy, and replaced the end of
      # line by ; instead of .
      rdfs:label "owned by" ;
      rdfs:comment "Relationship between an obelisk and the person who owns it, which is typically the person who ordered it, or to whom it was offered." .
    

Adding multilingual support

So far, all our labels and comments are written in English, yet there is no explicit indication that the text is actually in English within the vocabulary itself. To make the language of any text explicit, RDF provides the concept of a language tag, which can be placed directly after the text string itself. The value of these language tags is defined by the international IETF standard BCP-47 - for example, we can use @en for English, or @fr for French.

This example shows how easy it is to explicitly provide both English and French labels and comments for terms in our vocabulary:

@prefix obelisk: <http://w3id.org/obelisk/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

obelisk:Obelisk rdf:type rdfs:Class ;
    # A label explicitly in English...
    rdfs:label "Obelisk"@en ;
    # ... as well as a comment in English...
    rdfs:comment "An obelisk is a four-sided pillar with a pyramid-shaped top."@en ;
    # ...and the same comment in French...
    rdfs:comment "Un obélisque est un pilier à quatre côtés dont le sommet est en forme de pyramide."@fr .

obelisk:Sculptor rdf:type rdfs:Class ;
    rdfs:label "Sculptor"@en ;
    rdfs:label "Sculpteur"@fr ;
    rdfs:comment "An artist who sculpts obelisks."@en ;
    rdfs:comment "Un artiste qui taille des obélisques"@fr .

Of course for many text values the concept of ‘language’ is meaningless, for instance Social Security Numbers in the United States are often written as strings, as they contain hyphens (e.g. ‘123-12-7890’), or the concept of a username (or nickname) will most often not have any associated language. For these common use-cases, simply not specifying a language tag at all is expected.

Adding some metadata

The finishing touch to this vocabulary is to add some metadata about the vocabulary itself, so that people we share this vocabulary with (or who search for it, or who just stumble across it on the web), can know who created it, and when, and what it’s intended purpose is, without having to go through all the details of the individual terms contained within it.

We already decided that the IRI of our vocabulary would be http://w3id.org/obelisk/, so this is the identifier we are going to use in RDF to say stuff about the vocabulary itself. In Linked Data terminology a vocabulary is called an owl:Ontology, so the first thing to say is:

@prefix obelisk: <http://w3id.org/obelisk/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .

# `obelisk:` is equivalent to http://w3id.org/obelisk/
obelisk: rdf:type owl:Ontology .

# The remainder of the vocabulary is unchanged.
obelisk:Obelisk a rdfs:Class ;
    rdfs:label "Obelisk" ;
    rdfs:comment "An obelisk is a four-sided pilar with a pyramid-shaped top." .
# ...

Adding a description

Much like we described each term with human-friendly labels and comments, we can now add a title (using the property dcterms:title) and a description (using the property dcterms:description) to our vocabulary. To make it easier to reuse, we can also indicate a suggested or preferred prefix (vann:preferredNamespacePrefix) and a suggested or preferred IRI (vann:preferredNamespaceUri) (since multiple IRIs may point to the same vocabulary).

@prefix obelisk: <http://w3id.org/obelisk/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix vann: <http://purl.org/vocab/vann/> .

obelisk: rdf:type owl:Ontology ;
    dcterms:title "Obelisk ontology" ;
    # The description can be a multi-line text.
    dcterms:description """
    The obelisk ontology aims at describing obelisks.
    """ ;
    vann:preferredNamespacePrefix "obelisk" ;
    vann:preferredNamespaceURI <http://w3id.org/obelisk/> .

# The remainder of the vocabulary is unchanged.
obelisk:Obelisk a rdfs:Class ;
    rdfs:label "Obelisk" ;
    rdfs:comment "An obelisk is a four-sided pilar with a pyramid-shaped top." ;
# ...

Some simple naming conventions

You may have noticed some of the simple naming conventions used in our examples so far. These conventions are extremely common (but not universal!) across RDF vocabularies.

  • The basic convention is to use Camel Case for all your terms, e.g. ‘ownedBy’ or ‘builtBy’.
  • Capitalize the first letter of Class terms, e.g. Obelisk, or Sculptor.
  • Lower-case the first letter of Property terms, e.g. height or ownedBy.
  • Lower-case prefixes, e.g. @prefix obelisk: <...>.
  • Don’t use hyphens, use underscores instead, because it simplifies using them in some programming languages.

Reference

A reference version of this final vocabulary is available here, and you can experiment with the syntax using a live RDF validator.

Next step: publish your vocabulary on your Pod.