What if every subject that we think about can have explicit representation in our computers?

2008 Semantic Technology Conference: random observations

 

I am back from Semantic Technology Conference. It is becoming bigger and bigger each year. This year there were more than hundred sessions, full day of tutorials, product exhibition. It was quite crowded and energizing.

Just some random observations:

  • Oracle improves RDF / OWL support in 11g database, considers RDF/OWL as strategic/enabling technologies which will be leveraged in future versions of Oracle products.
  • Yahoo uses RDF to organize content on various web sites. It also introduced SearchMonkey – extension to Yahoo search platform which allows to provide more detailed information about information resources.
  • Consumer oriented web sites powered by semantic technologies are here. Twine, Freebase, Powerset are good examples, more to come.
  • Resource Oriented Architecture and RDF could be a very powerful combination. More and more people understand the value of exposing data through URIs in the form of information resources. Linked Data initiative looks quite interesting.
  • Some advanced semantic applications use knowledge representation formalisms that go beyond basic RDF/OWL model. But RDF/OWL can be used to surface/exchange information based on W3C standards. Lots of discussions about information provenance, trust, “semantic spam”.
  • It looks like there is a workable solution (compromise) for ‘Web’s Identity Crisis’. The idea is to reserve HTTP 303 (“See Other”) code for indication of “Concept URIs”. 303 response should include an additional URI for “See Other” information resource. This approach combined with new PURL -like servers allows to keep RDF “as is” and to implement something close to the idea of Published Subject Identifiers
  • Franz demonstrated a new version of AllegroGraph 64-bit RDFStore. Franz implemented support for Named Graphs (can be used for representing weights, trust factors, provenance) and incorporated geospatial and temporal libraries. Named Graphs allow to deal with contexts using RDF.
  • Text analysis tools become better and better. Interesting example is Ontos. Incorporating natural language processors allows to extract entities and relationships with reasonable level of precision (News Portal sample).
  • Doug Lenat did a great presentation on the conference about the history of Cyc project. It looks like in 5-10 years we can expect “artificial intelligent assistants” with quite sophisticated abilities to reason.
 
· 2008 SemTech Conference · Cyc · Douglas Lenat · Franz Inc · Ontos AG · Oracle (Corporation) · OWL · PSI · RDF · Resource-Oriented Architecture · Yahoo! ·

Serendipitous reuse and representations with basic ontological commitments

 

Steve Vinoski published a very interesting article: Serendipitous reuse. He also provided additional comments in his blog. The author explores benefits of RESTful uniform interfaces based on HTTP “verbs” GET, PUT, POST and DELETE for building expansible distributed systems. He also compares RESTful approach with traditional SOA implementations based on strongly typed operation-centric interfaces.

Serendipitous reuse is one of the main goals of Subject-centric computing. In addition to uniform interfaces, Subject-centric computing promotes usage of uniform representations with basic ontological commitments (as one of the possible representations).

One of the fundamental principles of the Resource-Oriented Architecture is the support for multiple representations for the same resource. For example, if we have a RESTful service which collects information about people, GET request can return multiple representations.

Example using JSON:


{
    "id":          "John_Smith",
    "type":        "Person",
    "first_name":  "John",
    "last_name":   "Smith",    
    "born_in":      {
                        "id": "Boston_MA_US", 
                        "name": "Boston" 
                    }
} 

Example using one of the “domain specific” XML vocabularies:



<person id="John_Smith">
    <first_name>John</first_name>
    <last_name>Smith</last_name>
    <born_in ref="Boston_MA_US">Boston</born_in>
</person>    

Example using one of the “domain independent” XML vocabularies:



<object obj_id="John_Smith">
        <property prop_id="first_name" prop_name="first name">John</property>
        <property prop_id="last_name" prop_name="last name">Smith</property>
        <property prop_id="born_in" prop_name="born in" val_ref="Boston_MA_US">
                 Boston
        </property>
</object>    

Example using HTML:



<div class="object">
    <div class="data-property-value">
        <div class="property">first name</div>
        <div class="value">John</div>
    </div>    
    <div class="data-property-value">
        <div class="property">last name</div>
        <div class="value">Smith</div>
    </div>    
    <div class="object-property-value">
        <div class="property">born in</div>
        <div class="value">
            <a href="/Boston_MA_US">Boston</a>
        </div>
    </div>    
</div>    

Example using text:


John Smith was born in Boston

These five formats are examples of data-centric representations without built-in ontological commitments. These formats do not define any relationship between representation and things in the “real world”. Programs which communicate using JSON, for example, do not “know” what “first_name” means. It is just a string that is used as a key in a hash table.

Creators of RESTful services typically define additional constraints and default interpretation for corresponding data-centric representations. For example, we can agree to use “id” string in JSON-based representation as an object identifier and we can publish some human readable document which describes and clarifies this agreement. But the key moment is that this agreement is not a part of JSON format.

Even if we are talking about a representation based on a domain specific XML vocabulary, semantic interpretation is outside of this vocabulary and is a part of an informal schema description (using comments or annotations).

Interestingly enough, level of usefulness is different for various representations. In case of a text, for example, computer can show text “as is”. It is also possible to do full-text indexing and to implement simple full-text search.

HTML-based representations add some structure, ability to use styles and linking between resources. Some links analysis can help to improve results of basic full-text search.

If we look at representations based on Topic Maps, situation is different. Topic Maps technology is a knowledge representation formalism and it embeds a set of ontological commitments. Topic Maps-based representations, for example, commit to such categories as topics, subject identifiers, subject locators, names, occurrences (properties) and associations between topics. There is also the commitment to two association types: “instance-type” and “subtype-supertype”. Topic Maps also support contextual assertions (using scope).

In addition, Topic Maps promote usage of Published Subject Identifiers (PSIs) as a universal mechanism for identifying “things”.

Topic Maps – based representations are optimized for information merging. For example, computers can automatically merge fragments produced by different RESTful services:

Fragment 1 (based on draft of Compact Syntax for Topic Maps: CTM):


p:John_Smith
   isa po:person; 
   - "John Smith"; 
   - "John" @ po:first_name; 
   - "Smith" @ po:last_name
.

g:Boston_MA_US - "Boston"; isa geo:city. 

po:born_in(p:John_Smith : po:person, g:Boston_MA_US : geo:location)

Fragment 2:


g:Paris_FR - "Paris"; isa geo:city. 

po:likes(p:John_Smith : po:person, g:Paris_FR : o:object)

Result of automatic merging:


p:John_Smith
   isa po:person; 
   - "John Smith"; 
   - "John" @ po:first_name; 
   - "Smith" @ po:last_name
.

g:Boston_MA_US - "Boston"; isa geo:city. 

g:Paris_FR - "Paris"; isa geo:city. 

po:born_in(p:John_Smith : po:person, g:Boston_MA_US : geo:location)

po:likes(p:John_Smith : po:person, g:Paris_FR : o:object)

As any other representation formalism, Topic Maps are not ideal. But Topic Maps enthusiasts think that Topic Maps capture a “robust set” of ontological commitments which can drastically improve our ability to organize and manage information and to achieve real reuse of information with added value.

 
· CTM · Resource-Oriented Architecture · Subject-centric computing · Topic Maps ·

Slides from Topic Maps 2008

 

I did a presentation and tutorial at Topic Maps 2008.

Link to Ruby Topic Maps in Action

Link to Enterprise Search, Faceted Navigation and Subject-Centric Portals

 
· Subject-centric computing · Topic Maps ·

Authoring topic maps using Ruby-based DSL: CTM, the way I like it

 

Designing and using Domain Specific Languages (DSL) is a popular programming style in Ruby community. I am experimenting with Ruby-based DSL for authoring topic maps. Surprisingly, the result is very close to my view on the “ideal” CTM (Compact Topic Maps syntax).

I just would like to share a sample that should demonstrate main ideas of this approach. It is a piece of Ruby code that generates topic maps (behind the scenes).

First topic map defines some simple ontology.


# some definitions to support DSL
# should be included

topic_map :ontology_tm do

  tm_base "http://www.example.com/topic_maps/people/" 

  topic(:person) {
    sid   "http://psi.example.com/Person" 
    name  "Person" 
    isa :topic_type
  }

  topic(:first_name) {
    sid   "http://psi.example.com/first_name" 
    name  "first name" 
    isa :name
  }

  topic(:last_name) {
    sid   "http://psi.example.com/last_name" 
    name  "last name" 
    isa :name
  }

  topic(:web_page) {
    sid   "http://psi.example.com/web_page" 
    name  "web page" 
    isa :occurrence
    datatype :uri
  }

  topic(:age) {
    sid   "http://psi.example.com/age" 
    name  "age" 
    isa :occurrence
    datatype :integer
  }

  topic(:description) {
    sid   "http://psi.example.com/description" 
    name  "description" 
    isa :occurrence
    datatype :string
  }

  topic(:works_for) {
    sid   "http://psi.example.com/works_for" 
    name  "works for" 
    isa :property
    association :employment
    first_role :employee
    second_role :employer
    third_role :position_type  
    third_role_prefix :as
  }

  topic(:likes) {
    sid   "http://psi.example.com/likes" 
    name  "likes" 
    isa [:property, :association]
    association :likes
    first_role :person
    second_role :object
  }

end

Second topic map includes ontology and asserts some facts.

    
topic_map :facts_tm do  

  tm_base "http://www.example.com/topic_maps/people/john_smith" 

  tm_include :ontology_tm

  topic :john_smith do
      sid "http://psi.example.com/JohnSmith" 
      name  "John Smith" 
      name  "Johnny", :scope => :alt_name
      first_name "John" ; last_name  "Smith" 
      web_page "http://a.example.com/JohnSmith.htm" 
      works_for topic(:example_dot_com){
                              sid "http://www.example.com" 
                              name "example.com"; isa :company
                         }, 
                        :as => :program_manager, 
                        :scope => :date_2008_02_28
      likes [:italian_opera, :new_york]
      age 35
      description <<EOF
John Smith is a very nice person.
He works for example.com and likes Italian opera.
EOF

  end

end

 
· CTM · Domain Specific Language · Ruby (programming language) · Topic Maps ·

Subject-centric blog in XTM (Topic Maps interchange) format

 

XTM export has been available on Subject-centric blog from the first day. But, I think, it was not obvious what readers can do with it. I added a link to Subject-centric topic map in Omnigator (Topic Maps browser).

I also recently made XTM export compatible with Expressing Dublin Core Metadata Using Topic Maps recommendations.

My plan is to connect (aggregate) Subject-centric with other Topic Maps related blogs based on core “Subject-Resource” and simple “Blogging” ontologies.

I see XTM export as a small first step in promoting SAVE AS XTM INITIATIVE and building Topic Maps Grid

Additional resources:

 
· Topic Maps · XTM ·

Archive: