Xesam Search Specification

<!> This page is part of the XESAM specification version 0.9 (also known as RC1). The stable and blessed version will be named 1.0.

Design Goals

History

The xesam search api was originally proposed as two separate DBUS APIs, a Simple- and a fully featured Live- api. With time and lengthy discussions, it proved that a truly simple API would have extremely limited use. In a few efforts to save the Simple api we quickly converged towards the Live api. This is why the page name is XesamSearch90.

Terminology

DBUS Names

The primary search engine of the current session, should own the bus name org.freedesktop.xesam.searcher on the session bus. The object exposing the primary interface should have the path /org/freedesktop/xesam/searcher/main and implement the interface org.freedesktop.xesam.Search as described below.

org.freedesktop.xesam.Search

NewSession (out s session)

SetProperty (in s session, in s prop, in v val, out v new_val)

GetProperty (in s session, in s prop, out v value)

CloseSession (in s session)

NewSearch (in s session, in s query_xml, out s search)

StartSearch (in s search)

GetHitCount (in s search, out u count)

GetHits (in s search, in u num, out aav hits)

GetHitData (in s search, in au hit_ids, in as fields, out aav hit_data)

CloseSearch (in s search)

GetState (out as state_info)

Hit Ids

A hit is identified by the sequence number in which it was read with GetHits. Fx. the first hit retrieved will have the id 0. The 10'th hit retrieved will have id 9 etc.

Signals

The signals include the handle for the appropriate search. Language bindings (or direct consumers) can use dbus match rules to filter out irrelevant signals (from other xesam consumers' searches fx.). The HitsRemoved and HitsModified signals are only expected when search.live==True, but HitsAdded is always used regardless of search.live state.

Session Properties

The values of session properties are expressed as dbus variants - mainly to allow lists of values for a single property. The types allowed for property values are string, integer, boolean, and arrays of said types.

[1]: The exact value of this property depends on the xesam metadata spec which is yet to be finished.

Field names vs Session properties: It is important to understand that metadata field names and session properties are not the same. Generally a metadata field is something that is stored in the search engines index and a property refers to some state stored with the given Session.

hit.fields Property, GetHits Return Value

The return value of GetHits and GetHitData is a sorted array of hits. A hit consists of an array of fields as requested through the session property hit.fields. Since the signature of the return value is aav a single hit is on the form av. This allows hit properties to be integers, strings or arrays of any type. An array of strings is fx. needed for email CC fields and keywords/tags for example. The returned fields are ordered according to hit.fields. Fx. if hit.fields = ["xesam:title", "xesam:userKeywords", "xesam:size"] (field names are defined in XesamOntology90) a return value would look like:

[
  ["Desktop Search Survey", ["xesam", "search", "hot stuff"], 54367]
  ["Gnome Tips and Tricks", ["gnome", "hacking"], 437294]
]

Unset Fields: If a server encounters an unset field it should default to the following values, according to the field data type:

Unknown Field Names: If the server gets a request for an unknown field (via GetHitData or through the hit.fields property) it should return an empty string for that field.

Field Data Types: The data type of the returned dbus variant for each field, is partly determined by the ontology. The rule is that the returned data type should convert cleanly under standard conversions to the one prescribed by the ontology. Ie if the ontology prescribes that the server should return an integer for the xesam:width field then server can return it as a string, integer or float. Fx "10", 10, or 10f. Clients should not be harmed by this as most modern toolkits have provisions for doing dynamic type conversions. For example GLib has GValue.transform and Qt has QVariant::convert.

vendor.ontologies Property, Ontology Introspection

The session property vendor.ontologies is used to introspect which ontologies are known by the service vendor. A service owning the bus name org.freedesktop.xesam.searcher must know and respect the default ''xesam-core'' ontology.

An ontology reference is a triple of strings (unique_name,version,path) and the type of the session property vendor.ontologies is an array of ontology references - ie it has a dbus signature of aas.

An example shared online search service using Yahoo and Google as backends might have the following value for vendor.ontologies:

[
        ["yahoo", "1.0", "/usr/share/ontologies/yahoo-1.0"],
        ["google", "1.0", "/usr/share/ontologies/google-1.0"]
]

The values of the ontology-triples (unique_name, version, path) deserve description:

Ontologies are installed in a directory under {XDG_USER_DATA_DIR,XDG_SYSTEM_DATA_DIR}/ontologies named <unique_name>-<version>.

FIXME: There should be some kind of metadata for the ontology itself such as a vendor name (the unique name as in the dir-name), ontology version, full vendor name (free form string), ontology description, ontology license. Whether this is stored in a separate file or embedded in the ontology itself (could be done in RDF/XML for example) is another matter to be decided later.

FIXME: We need still need consensus on the ontology representation format (RDF vs .ini)

vendor.extensions Property, Query Extensions

The xesam query language supports a number of optional extensions on top of the base language. A search engine supporting regular expression matching and fuzzy string matching should return

  ["regExp", "fuzzy"]

Simple Use Case

Retrieve a list of URIs matching a query:

session = NewSession()
search = NewSearch (session, query)
StartSearch(search)
hits = GetHits (search, 1000)
CloseSession (session)

Advanced Use Case

A live search doing non-blocking requests and hinting to the search engine that it will retrieve snippets for each hit:

session = NewSession()
SetProperty (session, "search.live", "true")
SetProperty (session, "search.blocking", "false")
SetProperty (session, "hit.fields", ["uri","dc:title"])
SetProperty (session, "hit.fields.extended", ["snippet"])
search = NewSearch (session, query)
<register signal handlers and match rules for the search handle>
StartSearch (search)
if HitsAdded(count):
    GetHits(session, count)
    <update ui>
    GetHitData (search, hit_ids, ["snippet"])
    <update ui with snippets>
else if HitsRemoved (hit_ids):
    <remove all affected hits>
else if HitsModified (hit_ids):
    new_data = GetHitData(search, hit_ids, ["uri", "dc:title", "snippet"])
    <update ui with new data>
...
CloseSearch (search)
...

Resources

xesam-tools - Command line tool to search xesam services. It is implemented with GObjects in Python. In addition to the command line tool it contains a nice stand-alone xesam module for PyGObject.

XesamSearch90 (last edited 2008-02-18 21:07:23 by localhost)