Tag Archives: Topic Maps

Subject-centric applications: toward subject-centric computing

Recently we added a small framework that allows us to build/use subject-centric applications in Ontopedia. Within traditional paradigm of application-centric computing, we have to start an application (or go to some domain/function specific website) and then we can change application/website context. Within subject-centric environment, we can select a subject and then we can have access to various applications/functions that can be used with the subject in context.

As an example, we implemented basic subject-centric application Company headquarters map. It can be used for various companies. This app tries to find a geo point of company headquarters and map it using Google Maps service.

Live example: Tibco Software Headquarters on a map

Subject-centric apps can use information recorded in Ontopedia knowledge map and provide rich, user friendly interactivity. They also can use external data sources and submit pieces of information to Ontopedia knowledge map.

If we have just one application, this is not too exciting. But we are talking about a framework that allows us to integrate multiple subject-centric apps which are relevant to various subjects. All these applications have a very important feature in common – they can ‘sense’ current subject context. This approach can be used on the Web or on a new generation of desktops.

More info about my perspective on knowledge maps/grids, subject-centric pages, widgets, apps can be found in these references:

Enterprise Search, Faceted Navigation and Subject-Centric Portals; Topic Maps, 2008

Ruby, Topic Maps and Subject-centric Blogging: Tutorial, Topic Maps, 2008

Enterprise Knowledge Map; Topic Maps, 2007

Topic Maps Grid; Extreme Markup 2006

Fundamentals and possibilities are described perfectly in Steve Pepper’s presentation on Topic Maps 2008 (PPT format):

Everything is a Subject

“Creating Linked Data” by Jeni Tennison

Jeni Tennison published several excellent blog entries which describe process of creating Linked Data. If you are interested in semantic technologies, you will find lots of important ideas in these postings.

From Jeni’s Musings :

Creating Linked Data – Part I: Analysing and Modelling

Creating Linked Data – Part II: Defining URIs

Creating Linked Data – Part III: Defining Concept Schemes

Creating Linked Data – Part IV: Developing RDF Schemas

Creating Linked Data – Part V: Finishing Touches

We have more and more Linked Data (published in RDF). I think that it is very important to find a way of using RDF data in Linked Topic Maps, and vice-versa. Traditional approach to RDF/Topic Maps mapping is based on idea of mapping between RDF Object Properties and Associations. I have been playing with the idea of mapping between Object Properties and Occurrences for quite some time. In Ontopedia, it is possible to represent RDF object properties directly as “occurrences”. Mapping between subject-centric occurrences and associations can be done later as results of inference. In Ontopedia, from user experience, there is no big difference between associations and subject-centric occurrences: identifiers, locators, names, associations, and occurrences are rendered as subject “properties”. I think that with this approach, RDF data can be “naturally” integrated with Topic Maps-based data sets.

Watching an interview about Powerset

InfoQ published an interview with Tom Preston-Werner on Powerset, GitHub, Ruby and Erlang. I really like projects that try to analyze text/resources on the web and try to implement “smart search”. Powerset is one of these projects. But what I like even more is the approach when we explicitly represent facts/information items using open knowledge representation standards such as Topic Maps or RDF.

Topic Maps can play the role of “knowledge middleware” that helps to integrate various components of “smart search puzzle”. A topic map-based index allows to represent and connect subjects and resources. Explicit representation of relatively small number of relationships (“facts”,”assertions”) between resources and subjects can dramatically change the world of smart search.

Topic Maps based-knowledge middleware is a disruptive technology because it replaces proprietary knowledge organization schemas and modules and it allows multiple players to build various solutions that help to create or use smart index.

Topic Maps-based Ontopedia PSI server, for example, can represent assertions that are manually created by users or generated by some algorithms. We do not have our own text analysis infrastructure, but I hope that in the future we can leverage some services on the web (such as OpenCalais) which can perform text analysis on “as needed” basis. The core ability of Ontopedia PSI server is maintaining explicit representations of subjects that are important for people and ability to maintain assertions about these subjects.

The new version of Ontopedia PSI server can play a role of an aggregator that can extract assertions from existing topic maps/fragements hosted on other websites. Assertions from multiple sources are aggregated into one assertion set/information map/semantic index. Ontopedia PSI server keeps track of information provenance and supports multiple truth values. The server, for example, can handle a situation when one source on the web asserts that Person X did a Presentation P and someone else makes the opposite assertion.

I think that natural language processing can play a huge role in improving search. Ideal text analysis tool should allow to provide ‘clues’ about subjects in a text. I am looking for equivalent of some kind of ‘binding’ that is used in programming quite often these days. I would love to have the ability to provide list of main subjects in a form of PSIs to text analysis tool (using embedded markup or attached external assertions). If I do so, I expect much more precise results. If I do not have an initial list of subjects I expect some kind of suggestions from text analysis tools that I can check against existing information map.

Ontopedia (as many other Topic Maps-based projects) promotes usage of Public Subject Identifiers (PSIs) for “all thinkable” subjects. For example, there is an identifier for TMRA 2008 conference – http://psi.ontopedia.net/TMRA_2008 .
There are identifiers for each presenter and presentation. Basic relationships between various subjects are also “mapped”/explicitly represented. Each basic resource, such as a blog post can have a small assertion set that describes metadata (using Dublin Core metadata vocabulary, for example) and maybe some main assertions. Traditional websites can provide combined assertion sets in XTM or RDF which can be consumed by semantic aggregators such as Ontopedia PSI server. Text analysis is great (when it is good enough). But even simple (semi-)manual “mapping” of subjects, resources and relationships can change the search game.

When we manually try to “map” an existing resource such as a conference website for the first time, it can look as a complicated and time consuming task. Mapping a website for another conference will take much less time. And, of course, in many cases it is possible to reverse traditional website building/assertion extraction paradigm.

It is possible to build nice looking and functional web sites based on “assertion sets”. Topicmaps.com is a great example of this approach. It is driven by a topic map. Humans can enjoy HTML-based representation of this site and aggregators like Ontopedia PSI Server can consume raw XTM-based representation and aggregate it with other assertion sets such as TMRA 2008 conference assertion set.

References

Interview link on InfoQ

Extending Ontopedia PSI server to handle PURLs: support for RDF, step one

I have been thinking about RDF support on Ontopedia PSI server for quite some time. Semantic Technology Conference that I attended this spring gave me some new ideas in this direction. I decided to follow recommendations from Eric Miller’s and David Wood’s presentation “Persistent Identifiers for the ‘Real Web'” regarding PURLs (Persistent Uniform Resource Locators). Ontopedia PSI server was extended to handle PURLs

Each Published Subject Identifier (PSI) on http://psi.ontopedia.net has an equivalent PURL on http://purl.ontopedia.net. For example, http://psi.ontopedia.net/TMRA_2008 has the corresponding PURL http://purl.ontopedia.net/TMRA_2008. What happens when we type in our browser PURL http://purl.ontopedia.net/TMRA_2008? Ontopedia PURL server returns HTTP code 303 “See Other” with “Location” header set to http://psi.ontopedia.net/TMRA_2008.

For RDF-based applications, code 303 is an indication that URI does not correspond to a “digital resource”. Web browsers will automatically jump to http://psi.ontopedia.net/TMRA_2008 which will provide nice subject/resource description.

When we need to export RDF assertions from Ontopedia, we can do something like this:

<rdf:Description rdf:about="http://purl.ontopedia.net/TMRA_2008">
      <rdfs:label>
           TMRA 2008 (Topic Maps Research and Applications  Conference)
      </rdfs:label>
      <rdfs:comment>
               Fourth International Conference on 
               Topic Maps Research and Applications
       </rdfs:comment>
       <rdf:type rdf:resource="http://purl.ontopedia.net/Conference"/>	
</Definition>	

In topic maps-based version we can have:

<topic id="id_98c49a0d3d87f067a4ba13b6d2f6d086">
	<subjectIdentifier href="http://psi.ontopedia.net/TMRA_2008"/>
	<instanceOf>
           <topicRef href="http://psi.ontopedia.net/Conference"/>
        </instanceOf>
	<name>
	   <value>
              TMRA 2008 (Topic Maps Research and Applications  Conference)
           </value>
	</name>
	<occurrence>
            <type>
	        <topicRef href="http://psi.ontopedia.net/Description"/>
            </type>
            <resourceData>
                      Fourth International Conference on 
                      Topic Maps Research and Applications
            </resourceData>
	</occurrence>
</topic>

RDF-based version uses PURLs and Topic Maps-based version uses PSIs for identification of subjects/resources.

Reference:

Persistent Identifiers for the ‘Real Web’, David Wood, Eric Miller, May 2008, PDF

The new version of Ontopedia PSI server

The new version of Ontopedia PSI server is out now. It is possible to represent various types of assertions related to subjects (names, occurrences, associations). The new PSI server allows also to record and integrate opinions of different users. Its internal knowledge representation is optimized for paraconsistent reasoning.

I started to play with some topics that I am interested in. For example, Subject-centric Computing , Apple Inc .
As with typical Topic Maps-based system, we can easily add new subject and assertion types, we are not limited by fixed domain models. In addition, the new PSI server supports recording of assertion provenance and five truth values.

We also tried to follow the Resource-Oriented Architecture: each subject, each assertion, each subject-centric group of assertions of the same type has own Uri and “page”.

The main goal of this version is to experiment with assertion level subject-centric representations vs. more traditional portal-based approach.

Serendipitous reuse and representations with basic ontological commitments

Steve Vinoski published a very interesting article: Serendipitous reuse. He also provided additional comments in his blog. The author explores benefits of RESTful uniform interfaces based on HTTP “verbs” GET, PUT, POST and DELETE for building expansible distributed systems. He also compares RESTful approach with traditional SOA implementations based on strongly typed operation-centric interfaces.

Serendipitous reuse is one of the main goals of Subject-centric computing. In addition to uniform interfaces, Subject-centric computing promotes usage of uniform representations with basic ontological commitments (as one of the possible representations).

One of the fundamental principles of the Resource-Oriented Architecture is the support for multiple representations for the same resource. For example, if we have a RESTful service which collects information about people, GET request can return multiple representations.

Example using JSON:


{
	"id":          "John_Smith",
	"type":        "Person",
	"first_name":  "John",
	"last_name":   "Smith",	
	"born_in":      {
			   "id": "Boston_MA_US", 
			   "name": "Boston"
			}
} 

Example using one of the “domain specific” XML vocabularies:


<person id="John_Smith">
	<first_name>John</first_name>
	<last_name>Smith</last_name>
	<born_in ref="Boston_MA_US">Boston</born_in>
</person>	

Example using one of the “domain independent” XML vocabularies:


<object obj_id="John_Smith">
        <property prop_id="first_name" prop_name="first name">John</property>
        <property prop_id="last_name" prop_name="last name">Smith</property>
        <property prop_id="born_in" prop_name="born in" val_ref="Boston_MA_US">
                 Boston
        </property>
</object>	

Example using HTML:


<div class="object">
	<div class="data-property-value">
		<div class="property">first name</div>
		<div class="value">John</div>
	</div>	
	<div class="data-property-value">
		<div class="property">last name</div>
		<div class="value">Smith</div>
	</div>	
	<div class="object-property-value">
		<div class="property">born in</div>
		<div class="value">
			<a href="/Boston_MA_US">Boston</a>
		</div>
	</div>	
</div>	

Example using text:


John Smith was born in Boston

These five formats are examples of data-centric representations without built-in ontological commitments. These formats do not define any relationship between representation and things in the “real world”. Programs which communicate using JSON, for example, do not “know” what “first_name” means. It is just a string that is used as a key in a hash table.

Creators of RESTful services typically define additional constraints and default interpretation for corresponding data-centric representations. For example, we can agree to use “id” string in JSON-based representation as an object identifier and we can publish some human readable document which describes and clarifies this agreement. But the key moment is that this agreement is not a part of JSON format.

Even if we are talking about a representation based on a domain specific XML vocabulary, semantic interpretation is outside of this vocabulary and is a part of an informal schema description (using comments or annotations).

Interestingly enough, level of usefulness is different for various representations. In case of a text, for example, computer can show text “as is”. It is also possible to do full-text indexing and to implement simple full-text search.

HTML-based representations add some structure, ability to use styles and linking between resources. Some links analysis can help to improve results of basic full-text search.

If we look at representations based on Topic Maps, situation is different. Topic Maps technology is a knowledge representation formalism and it embeds a set of ontological commitments. Topic Maps-based representations, for example, commit to such categories as topics, subject identifiers, subject locators, names, occurrences (properties) and associations between topics. There is also the commitment to two association types: “instance-type” and “subtype-supertype”. Topic Maps also support contextual assertions (using scope).

In addition, Topic Maps promote usage of Published Subject Identifiers (PSIs) as a universal mechanism for identifying “things”.

Topic Maps – based representations are optimized for information merging. For example, computers can _automatically_ merge fragments produced by different RESTful services:

Fragment 1 (based on draft of Compact Syntax for Topic Maps: CTM):


p:John_Smith
   isa po:person; 
   - "John Smith"; 
   - "John" @ po:first_name; 
   - "Smith" @ po:last_name
.

g:Boston_MA_US - "Boston"; isa geo:city. 

po:born_in(p:John_Smith : po:person, g:Boston_MA_US : geo:location)

Fragment 2:


g:Paris_FR - "Paris"; isa geo:city. 

po:likes(p:John_Smith : po:person, g:Paris_FR : o:object)

Result of automatic merging:


p:John_Smith
   isa po:person; 
   - "John Smith"; 
   - "John" @ po:first_name; 
   - "Smith" @ po:last_name
.

g:Boston_MA_US - "Boston"; isa geo:city. 

g:Paris_FR - "Paris"; isa geo:city. 

po:born_in(p:John_Smith : po:person, g:Boston_MA_US : geo:location)

po:likes(p:John_Smith : po:person, g:Paris_FR : o:object)

As any other representation formalism, Topic Maps are not ideal. But Topic Maps enthusiasts think that Topic Maps capture a “robust set” of ontological commitments which can drastically improve our ability to organize and manage information and to achieve real reuse of information with added value.

Authoring topic maps using Ruby-based DSL: CTM, the way I like it

Designing and using Domain Specific Languages (DSL) is a popular programming style in Ruby community.
I am experimenting with Ruby-based DSL for authoring topic maps. Surprisingly, the result is very close to
my view on the “ideal” CTM (Compact Topic Maps syntax).

I just would like to share a sample that should demonstrate main ideas of this approach. It is a piece of Ruby code that generates topic maps (behind the scenes).

First topic map defines some simple ontology.


# some definitions to support DSL
# should be included

topic_map :ontology_tm do
  
  tm_base "http://www.example.com/topic_maps/people/"

  topic(:person) {
    sid   "http://psi.example.com/Person"
    name  "Person"
    isa :topic_type
  }
  
  topic(:first_name) {
    sid   "http://psi.example.com/first_name"
    name  "first name"
    isa :name
  }

  topic(:last_name) {
    sid   "http://psi.example.com/last_name"
    name  "last name"
    isa :name
  }
  
  topic(:web_page) {
    sid   "http://psi.example.com/web_page"
    name  "web page"
    isa :occurrence
    datatype :uri
  }

  topic(:age) {
    sid   "http://psi.example.com/age"
    name  "age"
    isa :occurrence
    datatype :integer
  }
  
  topic(:description) {
    sid   "http://psi.example.com/description"
    name  "description"
    isa :occurrence
    datatype :string
  }
  
  topic(:works_for) {
    sid   "http://psi.example.com/works_for"
    name  "works for"
    isa :property
    association :employment
    first_role :employee
    second_role :employer
    third_role :position_type  
    third_role_prefix :as
  }
  
  topic(:likes) {
    sid   "http://psi.example.com/likes"
    name  "likes"
    isa [:property, :association]
    association :likes
    first_role :person
    second_role :object
  }
  
end

Second topic map includes ontology and asserts some facts.

	
topic_map :facts_tm do  
  
  tm_base "http://www.example.com/topic_maps/people/john_smith"

  tm_include :ontology_tm
 
  topic :john_smith do
      sid "http://psi.example.com/JohnSmith"
      name  "John Smith"
      name  "Johnny", :scope => :alt_name
      first_name "John" ; last_name  "Smith"
      web_page "http://a.example.com/JohnSmith.htm"
      works_for topic(:example_dot_com){
                              sid "http://www.example.com"
                              name "example.com"; isa :company
                         }, 
    	                :as => :program_manager, 
    	                :scope => :date_2008_02_28
      likes [:italian_opera, :new_york]
      age 35
      description <

Subject-centric blog in XTM (Topic Maps interchange) format

XTM export has been available on Subject-centric blog from the first day. But, I think, it was not obvious what readers can do with it. I added a link to Subject-centric topic map in Omnigator (Topic Maps browser).

I also recently made XTM export compatible with Expressing Dublin Core Metadata Using Topic Maps recommendations.

My plan is to connect (aggregate) Subject-centric with other Topic Maps related blogs based on core “Subject-Resource” and simple “Blogging” ontologies.

I see XTM export as a small first step in promoting SAVE AS XTM INITIATIVE
and building Topic Maps Grid

Additional resources:

Expressing Dublin Core in Topic Maps

Subject-centric computing and robotics: Osaka will soon be known as the capital of the robotics world..?

I was in Kyoto for three days in December. Osaka-Kobe-Kyoto is a region with high concentration of companies involved in robotics. I cannot stop thinking about robotics and Subject-centric computing after this trip. Traditionally, when we talk about Subject-centric computing (SCC) and Topic Maps (as enabling technology), we assume more or less slowly evolving models. In the world of robotics, models are evolving in real time.

There are many specialized technologies in robotics such as motion control, sensor information processing, image and speech recognition, planning. But the fundamental SCC concepts of identity and assertions-in-a-context are equally applicable to real- and close-to-real-time scenarios. Robots have to “understand” subjects that are important for humans. “Understanding” means (at least) explicit representations of these subjects inside of robot “brains”.

Interesting observation is that robots will explore new subjects and will generate a lot of new subject identifiers. For example, action planning generates goals-subgoals. Working in real-life environment means constantly dealing with new subjects, constructing assertions and identifiers for these subjects and trying to match them with subject representations in memory.

Create-new-or-reuse-existing-subject-proxy is a fundamental question in Subject-centric computing. Traditionally, we rely on a human to make this decision. In the world of robotics, we need to dive into the core of subject identity and subject recognition process.

I like Lego Mindstorms. I am looking forward to try some ideas related to Subject-centric computing and robotics in 2008. Specifically I am interested in investigation of these scenarios: creating a map of unknown “territory” using sensors, “identifying” subjects on a map in a dialog with a human, enriching information about subjects on a map with information from external “information grid”, evolving “territory” and automatic recognition of “old” and “new” subjects.