XMLMaster is not available in the current release of Protege-OWL. It is planned for release in the 3.4.7 version.    (C1U)

Basic Language Features    (C1V)

XMLMaster language is a superset of the Manchester Syntax. It is extended to allow the use of XPath expressions to refer to XML content. It also extends the Manchester Syntax to allow references to XML content in expressions and introduces a new reference clause to support these references    (C1W)

Basic Reference Clause    (C1X)

A reference clause uses an XPath expression to indicate content in an XML document. In our DSL, this reference clause can substitute for any clause in a Manchester Syntax expression that indicates an OWL class, property, individual, data type, or data value. Reference clauses are prefixed with @ and are followed by an XPath expression.    (C1Y)

For example, a reference to the first book’s name element in a document is written:    (C1Z)

To return element content, the text function can be used:    (C21)

The element name can be obtained in a similar way. The reference clause can be used in an expression to define OWL constructs using XML content. For example, the fol-lowing expression takes the text in the first book element in Figure 1 and declares an OWL named class as a subclass of an existing Book class:    (C23)

This expression declares an OWL class named by the con-tents of the first book element ("Huckleberry Finn" in this case) and asserts that it is a subclass of class Book. If the class has previously been declared and is not already a sub-class of Book, then the subclass relationship will be established. In this way, references can be used to define new OWL entities or to refer to existing entities.    (C25)

The language has default options to automatically extract either an element’s content or its name. As discussed later, this default can be changed globally. In the following examples, the default is to use an element’s content, so the text function qualification is omitted.    (C26)

A similar expression to declare an individual of type Book using the element contents as its name can be written:    (C27)

Of course, XPath expressions can match multiple elements. If the item selector is omitted from the above expression, multiple book elements are returned. For example, if the item selector is omitted, as in the following expression, individuals will declared for all book elements found.    (C29)

The Manchester Syntax supports a facts clause for associating property values with individuals. This clause contains a list of property value declarations. For example, the following is an expression specifying that an individual created from the first book element obtains a value for a data property value price from the associated price element and a category property value from its category attribute:    (C2B)

As can be seen, relative references are used here to refer to the price element and the category attribute. The language interprets relative references in terms of the most recent non-relative reference in an expression. As with element text qualification, value qualification is the default, and can be omitted for attribute nodes.    (C2D)

Any XPath expression can be used in a reference. For example, expressions can select nodes that meet particular criteria. Using this approach, the previous example can be modified to select books with a price over $25:    (C2E)

Similarly, the XPath expression can be modified to select only the first three book elements in the XML document:    (C2G)

Document structural information can also be recorded. For example, the position of a book element in a document can be retrieved using the XPath position and count functions. Maintaining positional information from XML documents is often essential, as OWL has no native list support. Summary information about documents, such as the number of elements of a particular type, can be recorded in a similar way using XPath counting functions.    (C2I)

XMLMaster supports the full range of Manchester Syntax expressions. For example, using the standard Manchester Syntax, annotation properties can also be associated with declared entities. Using this clause, a string data type annotation property called source can be used to associate a declared book individual with an annotation as follows:    (C2J)

OWL class and property expressions can also be used. In general, a class or property expression may occur anywhere a named class or property can occur. The following expression defines conditions for a class SalesItem; it uses the contents of a book element’s price and name sub-elements:    (C2L)

Using this approach, any OWL axiom can be declared us-ing the appropriate Manchester Syntax clause, with XPath references used in these clauses to extract XML content.    (C2N)

Reference Mapping Directives    (C2O)

The basic reference outlined above can deal with straightforward mappings. In many cases, however, refer-ences need to be qualified to resolve ambiguity or specify additional processing directives. XMLMaster provides a directives clause to support this specification (Figure 2).    (C2P)

The type of OWL entity in a reference cannot always be inferred. To deal with this case, our language supports ex-plicit entity type specifications. A reference may be fol-lowed by a parentheses-enclosed entity type specification to explicitly declare the type of referenced entity. This spe-cification can indicate that the entity is a named OWL class, an OWL object or data property, an OWL individual, or a data type. The keywords to specify the types are the standard Manchester Syntax keywords Class, ObjectProperty</{{{<nowiki>nowiki>}}}, DataProperty, and Individual, plus any XSD type name (e.g., xsd:int). The following uses this specification to write the book individual declaration above:    (C2Q)

In many cases, specifying the super class, super property, individual class membership, or data type of referenced entities is also desired. While these relationships can be defined using standard Manchester Syntax expressions, doing so often entails the use of multiple mapping expres-sions. To concisely support defining these relationships, a reference may be followed by a parentheses-enclosed list of type names. Using this approach, the above drug declaration can be written:    (C2S)

Type specifications can themselves be references. Super properties, individual class membership, and data types can be specified in the same way.    (C2U)

Global Mapping Directives    (C2V)

XMLMaster supports several global processing directives to specify default configuration options for the mapping process. They were designed to allow specification of common defaults for a set of mappings so that they do not have to be repeated in each expression.    (C2W)

The most common default controls element processing. This default specifies whether an element’s content or name is automatically extracted for elements resolved in an XPath expression. Thus, explicit text or name function qualification is not needed. Other defaults include the ability to declare a default namespace for both source XML documents and generated OWL entities. Prefix-to-namespace mappings can also be specified. In addition, directives control the handling of missing values in documents. In the default case, if an XPath expression evaluates to an empty value, the clause containing it is skipped. However, in some cases, users may wish to generate warn-ings or terminate the mapping when values are missing.    (C2X)

Directives are also provided to deal with references to OWL entities. For example, a directive can be set to indi-cate that an error should be thrown if a name refers to an existing entity in the target ontology. There is also a directive to indicate that an error should be thrown if the name does not refer to an existing entity. A related option deals with potential ambiguity introduced by annotation value references. It can be set to produce an error if more than one existing OWL entity could be named by the value.    (C2Y)

Our language provides an option specification clause for each option type. The general form of this clause is a keyword followed by a value. For example, the default name encoding for all mappings can be written:    (C2Z)

Advanced Language Features    (C31)

The language features outlined above support basic mappings of XML documents to OWL. Additional features are required for documents with unstructured text and missing elements. There is also a need for fine-grained control of the names and namespaces of created OWL entities.    (C32)

OWL Entity Name Encoding and Resolution    (C33)

Users can employ a variety of name-encoding and resolution strategies when they are creating or resolving OWL entities. The primary strategies are to use direct URI-based names (equivalent to using rdf:about or rdf:ID clauses in an RDF serialization of OWL) or rdfs:label annotation values. With rdf:ID encoding, an OWL entity generated from a reference is assigned its rdf:ID from the referenced content. If the content does not represent a fully qualified URI or a prefixed name, it is appended to a URI representing the namespace of the active ontology. Clearly, when using rdf:ID encoding, the content must represent a valid identifier - spaces are not allowed, for example. With rdfs:label encoding, the generated OWL entity is given an automatically generated (and non-meaningful) URI and its rdfs:label annotation value is set to the content specified by the XPath expression. A third encoding type is provided to support the case where the actual contents of the element are to be ignored. In this case, the generated OWL entity is again given an automatically generated URI, but its label is not assigned. The elements location in the XML document is used to track it.    (C34)

The default naming encoding uses the rdfs:label annotation property. As discussed, this default may also be changed globally.    (C35)

A name encoding clause explicitly specifies a desired encoding. As with entity type specifications, the clause is enclosed by parentheses after the XPath expression. The keywords to specify the encoding types are mm:DataValue, rdf:ID, rdfs:label, and mm:Location. The following is a specification of rdf:ID encoding for the previous book example using this clause:    (C36)

The default behavior is to use the text specified by the XPath expression when encoding a name. However, the text can first be processed with an optional value specifica-tion clause. This clause is indicated by the ‘=’ character after the encoding specification keyword, and is followed by either a single value specification or a comma-separated list of value specifications in parentheses. Value specifications are a quoted string, a reference, or a function.    (C38)

XMLMaster includes a predefined set of functions for ma-nipulating text. These include mm:prepend, mm:append, mm:trim, mm:replace, and mm:replaceAll. These functions take zero or more arguments and return a value. Arguments may be quoted strings, references, or functions. For exam-ple, the following is an expression that extends the earlier book class declaration to specify rdfs:label name encod-ing. It specifies that the extracted name should be preceded by the underscore character:    (C39)

A similar declaration that uses the mm:trim function to strip leading and trailing spaces can be written:    (C3B)

Here, the content specified by the XPath expression is the implied argument to the trim function. It is processed by the function and then assigned to the class’s label. The rdfs:label name-encoding specification is omitted because it is the default.    (C3D)

More than one encoding can be specified for a particular reference. In this way, separate identifier and label annota-tion values can be generated for a particular entity using different XPath expressions.    (C3E)

By default, OWL entity names are resolved or generated using the namespace of the active ontology. XMLMaster includes prefix and namespace directives to override this default behavior. These directives use the keywords mm:prefix and mm:namespace, and are followed by a quoted string. For example, the following is an expression indicating that a book individual created or resolved in the earlier expressions should use the namespace identified by the prefix literary:    (C3F)

Similarly, the following is an expression indicating that it must use the namespace "http://books.edu/Books.owl#":    (C3H)

Explicit namespace or prefix qualification also allows dis-ambiguation of duplicate labels in an ontology.    (C3J)

Processing of Element and Attribute Content    (C3K)

Text within elements or attribute values in XML documents frequently requires processing. As shown, some of the functions provided by our language support basic processing steps such as stripping spaces. More complex processing is also possible. For example, to remove all $ characters, the mm:replaceAll function can be used:    (C3L)

Functions can also be nested. For example, the expression can be rewritten to trim the text after substitution:    (C3N)

Extracting portions of semi-structured content is often required. XMLMaster provides a regular expression capturing group clause to support this extraction. This clause can be used in any position in a value specification clause. The clause is either contained in a quoted string enclosed by square parentheses or specified by a mm:capturing function.    (C3P)

For example, if a name element has a person's forename and surname separated by a single space, two capturing expressions can be used to selectively extract each name portion and assign them separately to different properties:    (C3Q)

Parentheses around sub-expressions in a regular expression clause specify capturing groups and indicate that matched strings are to be extracted. In some cases, more than one group may be matched for a value. In this case, the strings are extracted in the order that they are matched and are appended to each other.    (C3S)

A more complex variant to convert comma-specified float-ing point numbers to dot-specified is:    (C3T)

The mm:replace method would also work:    (C3V)

The syntax of capturing expressions follows that supported by the Java Pattern class. It provides quite a degree of flexibility when processing semi-structured text. Obviously, there are limitations to this method. Completely unstruc-tured text may require a separate pre-processing stage.    (C3X)

Missing Value Handling    (C3Y)

To deal with missing values, default values can also be specified in references. A default value clause is provided to assign these values. This clause is indicated by mm:default, and is followed by at least one value specifications. For example, the following expression uses this clause to indicate that the value 0.0 should be used as a price if the price sub-element is missing:    (C3Z)

XMLMaster also has additional behaviors to deal with missing values. The default behavior is to skip an entire expression if it contains any references with empty content. Four keywords are supplied to modify this behavior. They indicate that when a reference resolves to empty content: (1) an error should be thrown and the mapping process should be stopped (mm:ErrorIfEmptyLocation); (2) the expression should be skipped (mm:SkipIfEmptyLocation); (3) a warning should be generated and the reference should be skipped (mm:WarningIfEmptyLocation); and (4) expressions containing these references should be processed (mm:ProcessIfEmptyLocation). The last option allows processing of documents that contain many missing values. The option indicates that the language processor should, if possible, conservatively drop the sub-expression containing the empty reference rather than dropping the entire expression.    (C41)

Consider, for example, the following expression declaring book individuals and their prices:    (C42)

Here, using the default skip behavior action, a missing price element will cause the entire expression to be skipped. However, the process directive for the price property will instead drop only the sub-expression containing it if that element is empty. As a result, the expression will still declare an individual. More fine-grained empty value handling is also supported to specify different empty value handling behaviors for mm:DataValue, rdf:ID and rdfs:label values. Here, the label directives are mm:ErrorIfEmptyLabel, mm:SkipIfEmptyLabel, mm:WarningIfEmptyLabel, and mm:ProcessIfEmptyLabel, with equivalent keywords for RDF identifier and data value handling.    (C44)