This wiki describes language features available in the 3.4.7 or later release of Protege-OWL. Earlier versions are missing many of the features described here.    (C4C)

MappingMaster uses a domain specific language (DSL) to define mappings from spreadsheet content to OWL ontologies. This language is based on the Manchester OWL Syntax, which is itself a DSL for describing OWL ontologies. The Manchester Syntax supports the declarative specification of OWL axioms. Some example Manchester Syntax expressions can be found here.    (C4A)

For example, a Manchester Syntax declaration of an OWL named class Gum that is a subclass of a named class called Product can be written using using a class declaration clause as:    (BBJ)

The MappingMaster DSL extends the Manchester Syntax to support references to spreadsheet content in these declarations. MappingMaster introduces a new reference clause for referring to spreadsheet content. In this DSL, any clause in a Manchester Syntax expression that indicates an OWL named class, OWL property, OWL individual, data type, or a data value can be substituted with this reference clause. Any declarations containing such references are preprocessed and the relevant spreadsheet content specified by these references is imported. As each declaration is processed, the appropriate spreadsheet content is retrieved for each reference. This content can then be used in four main ways:    (BFX)

Using one of these approaches, each reference within an expression is thus resolved during preprocessing to a named OWL entity, a data type, or a data value. The resulting expression can then be executed by a standard Manchester Syntax processor.    (AXX)

Basic References Use In Expressions    (BEM)

Reference in the MappingMaster DSL are prefixed by the character ‘@’. These are generally followed by an Excel-style cell reference. In the standard Excel cell notation, cells extend from A1 in the top left corner of a sheet within a spreadsheet to successively higher columns and rows, with alpha characters referring to columns and numerical values referring to rows . For example, a reference to cell A5 in a spreadsheet is written as follows:    (BP0)

The above cell specification indicates that the reference is relative, meaning that if a formula containing the reference is copied to another cell then the row and column components of the reference are updated appropriately. An equivalent absolute reference, again adopting Excel notation, can be written as follows:    (BP2)

Sheets can also be specified by enclosing them in single quotes and using the "!" character separator:    (BS8)

For example, in the following spreadsheet rows 4 to 6 of column B contain product categories; columns D to G of row 2 contain state identifiers, and the grid range D4 to G6 contains sales amounts.    (BEP)

http://swrl.stanford.edu/MappingMaster/1.0/ScreenShots/ProductSales.png    (BEN)

These references can then be used in MappingMaster's DSL to define OWL constructs using spreadsheet content.    (BCM)

For example, a MappingMaster expression to declare that a class FlavouredGum is a subclass of the class named by the contents of cell B4 can be written:    (BBL)

When processed, this expression will create an OWL named class using the contents of cell B4 ("Gum") as the class name and declare FlavouredGum to be its subclass. If the class Gum already exists, the subclass relationship will simply be established.    (BQE)

That is, references can be used both to define new OWL entities or to refer to existing entities.    (BQP)

A similar expression to declare that the class SalesItem is equivalent to the class named by the contents of cell B4 can be written:    (BBR)

The Manchester Syntax also supports an individual declaration clause for declaring individuals; property values can be associated with the declared individuals using a facts subclause, which contains a list of property value declarations.    (BCH)

For example, an expression to specify that an individual created from the contents of cell D2 ("CA") has a value of "California" for a data property value hasStateName can be written:    (BCI)

Here, an individual will CA be created if necessary and associated with the data property hasStateName, which will be given the string value "California".    (BEU)

Using the standard Manchester Syntax, annotation properties can also be associated with declared entities.    (BQF)

For example, an existing string data type annotation property called hasSource can be used to associated the above declared California individual with the source document as follows:    (BQG)

Classes or properties can be annotated in the same way. For example, a class can be annotated with the hasSource annotation property as follows:    (C4W)

The Manchester Syntax also supports the use of OWL class expressions. In general, a class expression may occur anywhere a named class can occur.    (BCU)

For example, an expression to define a necessary and sufficient condition of a class Sale used the contents of cell D4 as the filler of an owl:HasValue axiom with the property hasAmount can be written:    (AY3)

In general, OWL entities named explicitly in a MappingMaster expression must already exist in the target ontology. In these examples, the classes Sale, SalesItem and FlavouredGum, and properties hasAmount, hasStateName and hasSource must already exist.    (BCO)

As mentioned, OWL entities specified through cell references are created on demand by default, though they may also refer to previously declared entities.    (BCP)

Specifying the Type of a Reference    (BRO)

In the above state declaration example, reference @D2 clearly refers to an OWL individual. However, the type cannot always be inferred and ambiguities may arise regarding the type of the referenced entity. To deal with this case, explicit entity type specifications are supported.    (BP5)

Specifically, a reference may be optionally followed by a parenthesis-enclosed entity type specification to explicitly declare the type of referenced entity. This specification can indicate that the entity is a named OWL class, an OWL object or data property, or an OWL individual or a data type. The MappingMaster keywords to specify the types are the standard Manchester Syntax keywords Class, ObjectProperty, DataProperty, and Individual, plus any XSD type name (e.g., xsd:int).    (BP6)

Using this specification, the above drug declaration, for example, can be written:    (BP7)

A declaration of an individual from cell B5 with an associated property value from cell C5 that is of type float can be specified as follows:    (BRH)

If the hasSalary data property is already declared to be of type xsd:float then the explicit type qualification is not needed. A global default type can also be specified for data values in the case where the type of the associated data property is either unknown or unspecified or if no explicit type is provided in the reference.    (BRL)

In many cases, specifying the super class, super property, or individual class membership of an entity is also desired. While these types of relationships can be defined using standard Manchester Syntax expressions, this approach will often entail the use of multiple mapping expressions. To concisely support defining these types of relationships, a reference may optionally be followed by a parenthesis-enclosed list of type names. Using this approach, the above drug declaration, for example, can be written:    (CBD)

References to OWL properties and individuals can be qualified in the same way.    (CBF)

Name Resolution for OWL Entities    (BG3)

A variety of name resolution strategies are supported when creating or referencing OWL entities from cells. The three primary strategies are to:    (BQM)

With rdf:ID encoding, and OWL entity generated from a cell reference is assigned its rdf:ID directly from the cell contents. Obviously, this content must represent a valid identifier (spaces are not, allowed in rdf:IDs for example).    (BQI)

Using rdfs:label encoding, an OWL entity generated from a cell reference is given an automatically generated (and non meaningful) URI and its rdfs:label annotation value is set to the content of the cell.    (BQJ)

With location encoding, an OWL entity generated from a cell reference also given an automatically generated (and non meaningful) URI but in this case the cell contents are unused.    (BQK)

The default naming encoding uses the rdfs:label annotation property. The default may also be changed globally.    (BQL)

A name encoding clause is provided to explicitly specify a desired encoding for a particular reference. As with entity type specifications, this clause is enclosed by parentheses after the cell reference. The keywords to specify the three types of encoding are mm:Location, rdf:ID, and rdfs:label.    (BPL)

Using this clause, a specification of rdf:ID encoding for the previous drug example can be written:    (BPM)

As mentioned, MappingMaster also supports entity creation where cell values are ignored. In this case, the keyword mm:Location can be used in parenthesis following a reference.    (BFC)

For example, an expression to create an individual for cell D4 while ignoring the contents of the cell can be written:    (BFE)

By default, OWL entities names are resolved or generated using the namespace of the currently active ontology. The language includes mm:prefix and mm:namespace clauses to override this default behavior.    (BS1)

For example, an expression to indicate that an individual created or resolved from the contents of cell A2 (assuming rdfs:label resolution) should use the namespace identified by the prefix "clinical", can be written:    (BS2)

Similarly, an expression to indicate that it must use the namespace "http://clinical.stanford.edu/Clinical.owl#" can be written:    (BS4)

Explicit namespace or prefix qualification in reference allows disambiguation of duplicate labels in an ontology.    (BS6)

Referring to OWL Entities in Expressions Using Annotation Values    (BPO)

To support direct references to annotation values in expressions, MappingMaster's DSL adopts the Manchester Syntax mechanism of enclosing these references in single quotes.    (BF5)

For example, if the OWL class Product has an rdfs:label annotation value 'A sellable product' it can be referred as follows:    (BF2)

‘A sellable product’ will be resolved through an annotation value to the class Product when this expression is processed.    (BF3)

Processing Cell Content    (BPP)

The default behavior is to directly use the contents of the referenced cell when encoding a name. However, this default can be overridden using an optional value specification clause. This clause is indicated by the '=' character immediately after the encoding specification keyword and is followed by a parenthesis-enclosed, comma-separated list of value specifications, which are appended to each other. These value specifications can be cell references, quoted values, regular expressions containing capturing groups (see below), or inbuilt text processing functions.    (BQY)

For example, an expression that extends a reference to specify that the entity created from cell A5 is to use rdfs:label name encoding and that the name is to be the value of the cell preceded by the string "Sale:" can be written as follows:    (BPC)

Value specification references are not restricted to the referenced cell itself and may indicate arbitrary cells. More than one encoding can also be specified for a particular reference so, for example, separate identifier and label annotation values can be generated for a particular entity using the contents of different cells.    (BRU)

For example, we can extend the example above to assign the rdf:ID of generated classes to cell B5 as follows:    (BRV)

If the assignment list includes only a single value then the opening and closing parenthesis can be omitted:    (BSA)

The language includes several inbuilt text processing methods that be used in value specifications. At present, several methods are supported. These include mm:prepend, mm:append, mm:toLowerCase, mm:toUpperCase, mm:trim, mm:reverse, mm:replace, mm:replaceAll, and mm:replaceFirst. These methods take zero or more arguments and return a value. Supplied arguments may be quoted string or a references.    (BRB)

The mm:replace and mm:replaceAll functions follow from the associated methods in the standard Java String class.    (BRP)

For example, the mm:prepend method can be used as follows to simplify the above example:    (BRM)

The expression can be further simplified by omitting the explicit rdfs:label qualification if it is the default:    (C47)

An expression to convert the contents of cell A5 to upper case before label assignment in the previous example can be written:    (BR0)

A method can also have an explicit first argument omitted if the argument refers to the current location value. The previous expression can thus also be written:    (BR7)

The language supports a data value encoding clause to allow a value specification clause to be used to assign values to data value. This clause has a similar form to the rdf:ID and rdfs:label name resolution clauses and is introduced by the keyword mm:DataValue. For example, an expression using this clause to create a string data value that is composed of the contents of cell A5 preceded by the string "Sale:" can be written:    (BQU)

If the parser can determine that the reference is a data value, explicit qualification can be omitted:    (C45)

For example, to remove all non alphanumeric characters from a cell before assignment, the mm:replaceAll function can be used as follows:    (BRQ)

A similar approach can be used to selectively extract values from referenced cells. A regular expression capturing groups clause is provided and can be used in any position in a value specification clause. This clause is contained in a quoted string enclosed by square parenthesis. For example, if cell A5 in a spreadsheet contains the string "Pfizer:Zyvox" but only the text following the ':' character is to be used in the label encoding, an appropriate capture expression could be written as:    (BPE)

Note that parentheses around the sub-expressions in a regular expression clause specify capture groups and indicate that the matched strings are to be extracted. In some cases, more than one group may be matched for a cell value, in which case the matched strings are extracted in the order that they are matched and are appended to each other.    (BQQ)

Capturing groups can also be used to generate data values. For example, if cell A2 in a spreadsheet has a person's forename, middle initial, and surname separated by a single space, three capturing expressions can be used to selectively extract each name portion and separately assign them to different properties as follows:    (BQS)

A similar example to separately extract two space-separated integers from a cell can be written as:    (BQW)

A more complex variant to convert commma-specified floating point numbers to dot-specified can be written:    (BRC)

If the hasSalary property is already of type xsd:float then the explicit qualification is not required here.    (BRG)

Of course, the mm:replace method would also work here:    (BRE)

Capturing expressions can also be invoked via the mm:capturing function:    (BRS)

The syntax of capturing expressions follows that supported by the Java Pattern class.    (BQT)

Value processing functions can also used outside of value specification clauses - but only if these clause are not used in a reference, and only a single function can be used.    (BRW)

For example, assuming default rdfs:label encoding, the string "_MM" can be appended to a generated label as follows using the mm:append function:    (BRX)

Similarly, the mm:replace method can be used to replace commas with periods when processing data values:    (BRZ)

Iterating Over a Range of Cells in a Reference    (BG5)

Obviously, most mappings will not just reference individual cells but will instead iterate of a range of columns or rows in a spreadsheet. The wildcard character '*' can then be used in references to refer to the current column and/or row in an iteration. MappingMaster provides a graphical interface to specify these ranges. (They will soon be supported in the DSL.)    (BCV)

Example references using this wildcard notation include:    (AZT)

For example, an expression that iterates over the grid D4 to G6 to create an individual of class Sale for each cell can be written:    (BFI)

This expression can be extended to assign property values to these individuals:    (BFL)

Missing Value Handling    (BPU)

To deal with missing cell values, default values can also be specified in references. A default value clause is provided to assign these values. This clause is indicated by the keyword mm:default and is followed by a parenthesis-enclosed, comma-separated list of value specifications. For example, the following expression uses this clause to indicate that the value “Unknown” should be used as the created class label if cell A5 is empty:    (BPV)

Additional behaviors are also supported to deal with missing cell values. The default behavior is to skip an entire expression if it contains any references with empty cells. Four keywords are supplied to modify this behavior. These keywords indicate that:    (BPX)

The last option allows processing of spreadsheets that may contain a large amount of missing values. The option indicates that the language processor should, if possible, conservatively drop the sub-expression containing the empty reference rather than dropping the entire expression.    (BQ2)

Consider, for example, the following expression declaring an individual from cell A5 of a spreadsheet and associating a property hasAge with it using the value in cell A6:    (BQ3)

Here, using the default skip behavior action, a missing value in cell A5 will cause the expression to be skipped. However, the process directive for the hasAge property value in cell A6 will instead drop only the sub-expression containing it if that cell is empty. So, if cell A5 contains a value and cell A6 is empty, the resulting expression will still declare an individual.    (BQ5)

Using a similar approach, more fine grained empty value handling is also supported to specify different empty value handling behaviors for mm:DataValue, rdf:ID and rdfs:label values. Here, the label directives are mm:ErrorIfEmptyLabel, mm:SkipIfEmptyLabel, mm:WarningIfEmptyLabel, and mm:ProcessIfEmptyLabel, with equivalent keywords for RDF identifier and data value handling. These are mm:ErrorIfEmptyID, mm:SkipIfEmptyID, mm:WarningIfEmptyID, mm:ProcessIfEmptyID and mm:ErrorIfEmptyDataValue, mm:SkipIfEmptyDataValue, mm:WarningIfEmptyDataValue, mm:ProcessIfEmptyDataValue.    (C4K)

One additional option is provided to deal with empty cell values. This option is targeted to the common case in many spreadsheets where a particular cell is supplied with a value and all empty cells below it are implied to have the same value. In this case, when these empty cells are being processed, their location must be ‘shifted’ to the location above it containing a value. For example, the following expression uses this keyword to indicate that call A5 does not contain a value for the name of the declared class then the row number must be shifted upwards until a value is found:    (BQ7)

If no value is found, normal empty value handling processing is applied. Similar directives provide for shifting down (mm:ShiftDown), and to allow shifting to the left (mm:ShiftLeft), or to the right (mm:ShiftRight).    (BQ9)

Manchester Syntax Coverage    (B17)

The DSL supports arbitrary OWL class expressions. The DSL will ultimately support the entire Manchester syntax. For the moment, because of limitations in the Manchester OWL Syntax parser in Protege-OWL 3.4, it supports only the following two additional constructs:    (B18)

Configuration Options    (B15)

A set of global defaults can be specified for entity types and name encoding. The language has a number of clauses to specify these defaults.    (B0R)

The following examples illustrate the use of these clauses together with the current defaults.    (B0S)

Summary    (B16)

The MappingMaster DSL effectively allows OWL entities to be created from or reference by arbitrary spreadsheet content. More importantly, the use of the Manchester syntax allows these entities to be related to each other in complex ways. Since the Manchester syntax supports the full OWL specification, very complex interrelationships can be specified.    (B1Z)

Declaratively specifying mappings in this way has several advantages. The writing of these mappings does not require any programming or scripting expertise. These mappings can be shared easily using the MappingMaster plugin, which saves the mappings in an OWL ontology. The mappings can easily be executed repeatedly on different spreadsheets with the same structure. Since MappingMaster is available as a plugin in Protégé-OWL, the results of these mappings can be examined immediately and the mappings modified as needed and immediately re-executed, speeding the development process. MappingMaster provides an interactive editor for the mapping DSL that supports on-the-fly entity name checking and dynamic expansion of entity references.    (AYJ)