This page discusses the rationale, design, and implementation of the JDBC backend for Protégé-2000. RayFergerson    (418)

Why we did it the way we did    (419)

There are a number of ways to map an object-oriented schema (essentially part of what Protégé produces) to an RDBMS schema. These different ways all involve a number of tradeoffs between optimizing for space, time, queries, modifications, insertion, deletion, etc. We wanted to support the possibility that our users might want to map onto a legacy, perhaps even non-relational, database. Different types of mappings also seem more appropriate at different development stages. Early on when you are just trying to get the class structure right you want something where the schema is easy to change. Later on when you are working with instances you may want something that is optimal for insertion, modification, and removal of instances and perhaps has a "natural" feel about the mapping of classes to tables. Even later when you are concentrating on applications you may want something that is optimized for exactly the queries that are most important to you. It was clear that no matter which approach we took for the backend, we was unlikely either to make very many people happy, or to keep them happy for very long. Thus there seemed to be two approaches possible:    (41A)

  1. Develop (or use) a mapping language that would allow us to map the ontology to any RDBMS schema    (41B)
  2. Allow users to write code to plugin into our system to access their database with whatever schema and optimizations they choose, and then provide a simple example implementation of this plugin.    (41C)

There are a number of commercial products that address option 1 by parsing java source files and producing RDBMS schema and interface classes of various types. We don't have java source laying around for classes created inside of Protégé. More importantly, we allow arbitrary changes to the class structure after we have acquired instances. When the classes are changed we try not to lose any more information from the instances than necessary. We also allow arbitrary "type migration" of instances from one class to another. These features are really central to the way users interact with our tool and we were not aware of any commercial support for such a thing.    (41D)

Thus developing something along the lines of option 1 was a lot of work and is orthogonal to the main thrust of our research here. We see ourselves as providing usable tools for knowledge base development and use and not as the world's experts on OO-to-relational mapping tools. Also, our bias is against doing such a large project with no experience with simpler systems. We prefer to build big things by extending little things that have proven to be inadequate. In this case we didn't have any experience building the little corresponding little things, inadequate or otherwise.    (41E)

Thus we choose option 2 above. This choice is also consistent with the rest of the Protégé plugin architecture. We chose the simplest schema that we could think of and focused on the "maximal change" usage where the class structure and hierarchy is undergoing constant change. In this design, therefore, there is no attention paid to things such as query performance of any type.    (41F)

What we did    (41G)

We have a single table that stores the entire contents of the knowledge base. This includes classes, slots, facets and instances. The table has a fixed number of columns which are listed below. The Protégé metaclass architecture (which is modeled after the CLOS Meta-Object Protocol) is used explicitly in the table to simplify things: we treat all classes, slots, and facets as frames. Each entry in the database corresponds to a frame in Protege. Classes have slots such as ":DIRECT_SUPERCLASS" to maintain the inheritance hierarchy. All frames have a :NAME slot which contains the name of the frame. The database format is:    (41H)

 Column			  SQL Type  Can be null	  Can be empty	Description    (41I)
 frame			   integer	No		No	frame id  Frame ID's < 10000 are reserved for the system.   
								The frame ids for system frames are declared in the 
                                                                file: edu.stanford.smi.protege.model.Model.java    (41J)
 frame_type		   smallint	No		No	same as "value_type" but for the frame column    (41K)
 slot			   integer	No		No	slot frame id    (41L)
 facet			   integer	No		No	facet frame id (0 if not a facet value)    (41M)
 is_template		   smallint	No		No	0 => value is OKBC "own", 1 => value is OKBC "template"    (41N)
 value_index		   integer	No		No	number used to maintain relative ordering of slot_or_facet_value 
                                                                entries for a frame-slot(-facet) combination    (41O)
 value_type		   smallint	No		No	number used to indicate the "type" of the value 
                                                                stored in slot_or_facet_value.  
								The number-to-type conversion is given in the 
                                                                file: edu.stanford.smi.protege.storage.database.DatabaseUtils.java    (41P)
 slot_or_facet_value	   varchar(N)	Yes		No	facet value if facet is not 0, slot value otherwise. Holds 
                                                                values of length that will fit in a varchar (typically <= 255)    (41Q)
 long_slot_or_facet_value  longvarchar	Yes		No	same as slot_or_facet_value but holds values too long to fit 
                                                                in slot_or_facet_value    (41R)

Note:    (41S)

If you browse a Protégé database you may notice that, in the case of the superclass and subclass relations, we appear to store duplicate information. For example with class A we store that its subclass is B and with B we store that its superclass is A. Maintaining separate records for these relations is necessary because we want to be able to maintain the ordering of both subclasses and superclasses. So while the "slot value" information is indeed duplicated in these records, the "index" information is unique. (Subclass ordering is a user-interface feature that a number of our users have requested. Protégé attaches no meaning to the ordering of superclasses or subclasses.)    (41T)