existdb – Store and retrieve data in an eXist database

Interact with eXist-db XML databases.

This package provides classes to ease interaction with eXist XML databases. It contains the following modules:

Direct database access

Connect to an eXist XML database and query it.

This module provides ExistDB and related classes for connecting to an eXist-db database and executing XQuery queries against it.

When used with Django, ExistDB can pull configuration settings directly from Django settings. If you create an instance of ExistDB without specifying a server url, it will attempt to configure an eXist database based on Django settings, using the configuration names documented below.

Projects that use this module should include the following settings in their settings.py:

#Exist DB Settings
EXISTDB_SERVER_USER = 'user'
EXISTDB_SERVER_PASSWORD = 'password'
EXISTDB_SERVER_URL = "http://megaserver.example.com:8042/exist"
EXISTDB_ROOT_COLLECTION = "/sample_collection"

To configure a timeout for most eXist connections, specify the desired time in seconds as EXISTDB_TIMEOUT; if none is specified, the global default socket timeout will be used.

Note

Any configured EXISTDB_TIMEOUT will be ignored by the existdb management command, since reindexing a large collection could take significantly longer than a normal timeout would allow for.

If you are using an eXist index configuration file, you can add another setting to specify your configuration file:

EXISTDB_INDEX_CONFIGFILE = "/path/to/my/exist_index.xconf"

This will allow you to use the existdb management command to manage your index configuration file in eXist.

If you wish to specify options for fulltext queries, you can set a dictionary of options like this:

EXISTDB_FULLTEXT_OPTIONS = {'default-operator': 'and'}

Note

Python xmlrpclib does not support extended types, some of which are used in eXist returns. This does not currently affect the functionality exposed within ExistDB, but may cause issues if you use the ExistDB.server XML-RPC connection directly for other available eXist XML-RPC methods. If you do make use of those, you may want to enable XML-RPC patching to handle the return types:

from eulexistdb import patch
patch.request_patching(patch.XMLRpcLibPatch)

If you are writing unit tests against code that uses eulexistdb, you may want to take advantage of eulexistdb.testutil.TestCase for loading fixture data to a test eXist-db collection, and eulexistdb.testutil.ExistDBTestSuiteRunner, which has logic to set up and switch configurations between a development and test collections in eXist.


class eulexistdb.db.ExistDB(server_url[, resultType[, encoding[, verbose]]])

Connect to an eXist database, and manipulate and query it.

Construction doesn’t initiate server communication, only store information about where the server is, to be used in later communications.

Parameters:
  • server_url – The eXist server URL. New syntax (as of 0.20) expects primary eXist url and not the /xmlrpc endpoint; for backwards compatibility, urls that include /xmlrpc` are still handled, and will be parsed to set exist server path as well as username and password if specified. Note that username and password parameters take precedence over username and password in the server url if both are specified.
  • username – exist username, if any
  • password – exist user password, if any
  • resultType – The class to use for returning query() results; defaults to QueryResult
  • encoding – The encoding used to communicate with the server; defaults to “UTF-8”
  • verbose – When True, print XML-RPC debugging messages to stdout
  • timeout – Specify a timeout for xmlrpc connection requests. If not specified, the global default socket timeout value will be used.
  • keep_alive – Optional parameter, to disable requests built-in session handling; can also be configured in django settings with EXISTDB_SESSION_KEEP_ALIVE
getDocument(name)

Retrieve a document from the database.

Parameters:name – database document path to retrieve
Return type:string contents of the document
createCollection(collection_name[, overwrite])

Create a new collection in the database.

Parameters:
  • collection_name – string name of collection
  • overwrite – overwrite existing document?
Return type:

boolean indicating success

removeCollection(collection_name)

Remove the named collection from the database.

Parameters:collection_name – string name of collection
Return type:boolean indicating success
hasCollection(collection_name)

Check if a collection exists.

Parameters:collection_name – string name of collection
Return type:boolean
load(xml, path[, overwrite])

Insert or overwrite a document in the database.

Parameters:
  • xml – string or file object with the document contents
  • path – destination location in the database
Return type:

boolean indicating success

query(xquery[, start[, how_many]])

Execute an XQuery query, returning the results directly.

Parameters:
  • xquery – a string XQuery query
  • start – first index to return (1-based)
  • how_many – maximum number of items to return
  • cache – boolean, to cache a query and return a session id (optional)
  • session – session id, to retrieve a cached session (optional)
  • release – session id to be released (optional)
Return type:

the resultType specified at the creation of this ExistDB; defaults to QueryResult.

executeQuery(xquery)

Execute an XQuery query, returning a server-provided result handle.

Parameters:xquery – a string XQuery query
Return type:an integer handle identifying the query result for future calls
querySummary(result_id)

Retrieve results summary from a past query.

Parameters:result_id – an integer handle returned by executeQuery()
Return type:a dict describing the results

The returned dict has four fields:

  • queryTime: processing time in milliseconds

  • hits: number of hits in the result set

  • documents: a list of lists. Each identifies a document and takes the form [doc_id, doc_name, hits], where:

    • doc_id: an internal integer identifier for the document
    • doc_name: the name of the document as a string
    • hits: the number of hits within that document
  • doctype: a list of lists. Each contains a doctype public

    identifier and the number of hits found for this doctype.

getHits(result_id)

Get the number of hits in a query result.

Parameters:result_id – an integer handle returned by executeQuery()
Return type:integer representing the number of hits
retrieve(result_id, position)

Retrieve a single result fragment.

Parameters:
  • result_id – an integer handle returned by executeQuery()
  • position – the result index to return
  • highlight – enable search term highlighting in result; optional, defaults to False
Return type:

the query result item as a string

releaseQueryResult(result_id)

Release a result set handle in the server.

Parameters:result_id – an integer handle returned by executeQuery()
createCollection(collection_name, overwrite=False)

Create a new collection in the database.

Parameters:
  • collection_name – string name of collection
  • overwrite – overwrite existing document?
Return type:

boolean indicating success

create_account(username, password, groups)

Create a user account; returns true if the user was created, false if the user already exists. Any other exist exception is re-raised.

create_group(group)

Create a group; returns true if the group was created, false if the group already exists. Any other exist exception is re-raised.

describeDocument(*args, **kwargs)

Return information about a document in eXist. Includes name, owner, group, created date, permissions, mime-type, type, content-length. Returns an empty dictionary if document is not found.

Parameters:document_path – string full path to document in eXist
Return type:dictionary
executeQuery(*args, **kwargs)

Execute an XQuery query, returning a server-provided result handle.

Parameters:xquery – a string XQuery query
Return type:an integer handle identifying the query result for future calls
getCollectionDescription(*args, **kwargs)

Retrieve information about a collection.

Parameters:collection_name – string name of collection
Return type:boolean
getDoc(name)

Alias for getDocument().

getDocument(name)

Retrieve a document from the database.

Parameters:name – database document path to retrieve
Return type:string contents of the document
getHits(*args, **kwargs)

Get the number of hits in a query result.

Parameters:result_id – an integer handle returned by executeQuery()
Return type:integer representing the number of hits
getPermissions(*args, **kwargs)

Retrieve permissions for a resource in eXist.

Parameters:resource – full path to a collection or document in eXist
Return type:ExistPermissions
hasCollection(collection_name)

Check if a collection exists.

Parameters:collection_name – string name of collection
Return type:boolean
hasCollectionIndex(collection_name)

Check if the specified collection has an index configuration in eXist.

Note: according to eXist documentation, index config file does not have to be named collection.xconf for reasons of backward compatibility. This function assumes that the recommended naming conventions are followed.

Parameters:collection – name of the collection with an index to be removed
Return type:boolean indicating collection index is present
hasDocument(*args, **kwargs)

Check if a document is present in eXist.

Parameters:document_path – string full path to document in eXist
Return type:boolean
load(xml, path)

Insert or overwrite a document in the database.

Parameters:
  • xml – string or file object with the document contents
  • path – destination location in the database
Return type:

boolean indicating success

loadCollectionIndex(collection_name, index)

Load an index configuration for the specified collection. Creates the eXist system config collection if it is not already there, and loads the specified index config file, as per eXist collection and index naming conventions.

Parameters:
  • collection_name – name of the collection to be indexed
  • index – string or file object with the document contents (as used by load())
Return type:

boolean indicating success

moveDocument(*args, **kwargs)

Move a document in eXist from one collection to another.

Parameters:
  • from_collection – collection where the document currently exists
  • to_collection – collection where the document should be moved
  • document – name of the document in eXist
Return type:

boolean

query(*args, **kwargs)

Execute an XQuery query, returning the results directly.

Parameters:
  • xquery – a string XQuery query
  • start – first index to return (1-based)
  • how_many – maximum number of items to return
  • cache – boolean, to cache a query and return a session id (optional)
  • session – session id, to retrieve a cached session (optional)
  • release – session id to be released (optional)
Return type:

the resultType specified at the creation of this ExistDB; defaults to QueryResult.

querySummary(*args, **kwargs)

Retrieve results summary from a past query.

Parameters:result_id – an integer handle returned by executeQuery()
Return type:a dict describing the results

The returned dict has four fields:

  • queryTime: processing time in milliseconds

  • hits: number of hits in the result set

  • documents: a list of lists. Each identifies a document and takes the form [doc_id, doc_name, hits], where:

    • doc_id: an internal integer identifier for the document
    • doc_name: the name of the document as a string
    • hits: the number of hits within that document
  • doctype: a list of lists. Each contains a doctype public

    identifier and the number of hits found for this doctype.

reindexCollection(collection_name)

Reindex a collection. Reindex will fail if the eXist user does not have the correct permissions within eXist (must be a member of the DBA group).

Parameters:collection_name – string name of collection
Return type:boolean success
releaseQueryResult(*args, **kwargs)

Release a result set handle in the server.

Parameters:result_id – an integer handle returned by executeQuery()
removeCollection(*args, **kwargs)

Remove the named collection from the database.

Parameters:collection_name – string name of collection
Return type:boolean indicating success
removeCollectionIndex(collection_name)

Remove index configuration for the specified collection. If index collection has no documents or subcollections after the index file is removed, the configuration collection will also be removed.

Parameters:collection – name of the collection with an index to be removed
Return type:boolean indicating success
removeDocument(*args, **kwargs)

Remove a document from the database.

Parameters:name – full eXist path to the database document to be removed
Return type:boolean indicating success
retrieve(*args, **kwargs)

Retrieve a single result fragment.

Parameters:
  • result_id – an integer handle returned by executeQuery()
  • position – the result index to return
  • highlight – enable search term highlighting in result; optional, defaults to False
Return type:

the query result item as a string

setPermissions(*args, **kwargs)

Set permissions on a resource in eXist.

Parameters:
  • resource – full path to a collection or document in eXist
  • permissions – int or string permissions statement
class eulexistdb.db.QueryResult(node=None, context=None, **kwargs)

The results of an eXist XQuery query

count

The number of results returned in this chunk

hits

The total number of hits found by the search

results

The result documents themselves as nodes, starting at start and containing count members

exception eulexistdb.db.ExistDBException

A handy wrapper for all errors returned by the eXist server.

Object-based searching

Provide a prettier, more Pythonic approach to eXist-db access.

This module provides QuerySet modeled after Django QuerySet objects. It’s not dependent on Django at all, but it aims to function as a stand-in replacement in any context that expects one.

class eulexistdb.query.QuerySet(model=None, xpath=None, using=None, collection=None, xquery=None, fulltext_options={})

Lazy eXist database lookup for a set of objects.

Parameters:
  • model – the type of object to return from __getitem__(). If set, the resulting xml nodes will be wrapped in objects of this type. Some methods, like filter() and only() only make sense if this is set. While this argument can be any callable object, it is typically a subclass of XmlObject.
  • xpath – an XPath expression where this QuerySet will begin filtering. Typically this is left out, beginning with an unfiltered collection: Filtering is then added with filter().
  • using – The ExistDB to query against.
  • collection – If set, search only within a particular eXist-db collection. Otherwise search all collections.
  • xquery – Override the entire Xquery object used for internal query serialization. Most code will leave this unset, which uses a default Xquery.
  • fulltext_options – optional dictionary of fulltext options to be used as settings for any full-text queries. See http://demo.exist-db.org/lucene.xml#N1047C for available options. Requires a version of eXist that supports this feature.
all()

Return all results.

This method returns an identical copy of the QuerySet.

also(*fields)

Return additional data in results.

Parameters:fields – names of fields in the QuerySet’s model

This method returns an updated copy of the QuerySet: It does not modify the original. When results are returned from the updated copy, they will contain the specified additional fields.

For special fields available, see only().

For performance considerations, see note on only().

also_raw(**fields)

Return an additional field by raw xpath. Similar to (and can be combined with) also(), but xpath is not pulled from the model. Use this when you want to retrieve a field with a different xpath than the one configured in your model. See Xquery.return_only() for details on specifying xpaths in raw mode.

Parameters:
  • fields – field name and xpath in keyword-args notation. If field is the name of a field on the associated model, the result of the raw xpath should be accessible on the return object as the normal property.
  • xpath – xpath for retrieving the specified field

Can be combined with also().

Example usage:

qs.also_raw(field_matches='count(util:expand(%(xq_var)s//field)//exist:match)')
count()

Return the cached query hit count, executing the query first if it has not yet executed.

distinct()

Return distinct results.

This method returns an updated copy of the QuerySet: It does not modify the original. When results are returned from the updated copy, they will contain only distinct results.

exclude(**kwargs)

Filter the QuerySet to return a subset of the documents that do not contain any of the filters. Uses the same syntax and allows for the same filters as filter().

filter(combine='AND', **kwargs)

Filter the QuerySet to return a subset of the documents.

Arguments take the form lookuptype or field__lookuptype, where field is the name of a field in the QuerySet’s model and lookuptype is one of:

  • exact – The field or object matches the argument value.

  • contains – The field or object contains the argument value.

  • startswith – The field or object starts with the argument value.

  • fulltext_terms – the field or object contains any of the the argument terms anywhere in the full text; requires a properly configured lucene index. By default, highlighting is enabled when this filter is used. To turn it off, specify an additional filter of highlight=False. Recommend using fulltext_score for ordering, in return fields.

  • highlight - highlight search terms; when used with fulltext_terms, should be specified as a boolean (enabled by default); when used separately, takes a string using the same search format as fulltext_terms, but content will be returned even if it does not include the search terms. Requires a properly configured lucene index.

  • in - field or object is present in a list of values

  • exists - field or object is or is not present in the document;

    if True, field must be present; if False, must not be present.

  • document_path - restrict the query to a single document; this must be a document path as returned by eXist, with full db path

  • gt, gte, lt, lte - greater than, greater than or equal,

    less than, less than or equal

Field may be in the format of field__subfield when field is an NodeField or NodeListField and subfield is a configured element on that object.

Field may also be one of the prefined ‘special’ fields; see only() for the list of fields.

Any number of these filter arguments may be passed. This method returns an updated copy of the QuerySet: It does not modify the original.

Parameters:combine – optional; specify how the filters should be combined. Defaults to AND; also supports OR.
get(**kwargs)

Get a single result identified by filter arguments.

Takes any number of filter() arguments. Unlike filter(), though, this method returns exactly one item. If the filter expressions match no items, or if they match more than one, this method throws an exception.

Raises a eulexistdb.exceptions.DoesNotExist exception if no matches are found; raises a eulexistdb.exceptions.ReturnedMultiple exception if more than one match is found.

getDocument(docname)

Get a single document from the server by filename.

only(*fields)

Limit results to include only specified fields.

Parameters:fields – names of fields in the QuerySet’s model

This method returns an updated copy of the QuerySet: It does not modify the original. When results are returned from the updated copy, they will contain only the specified fields.

Special fields available:
  • fulltext_score - lucene query; should only be used when a fulltext query has been used
  • document_name, collection_name - document or collection name where xml content is stored in eXist
  • hash - generate and return a SHA-1 checksum of the root element being queried
  • last_modified - DateTimeField for the date the document the xml element belongs to was last modified

NOTE: Be aware that this will result in an XQuery with a constructed return. For large queries, this may have a significant impact on performance. For more details, see http://exist.sourceforge.net/tuning.html#N103A2 .

only_raw(**fields)

Limit results to include only specified fields, and return the specified field by xpath. Similar to (and can be combined with) only(). See Xquery.return_only() for details on specifying xpaths in raw mode.

See also_raw() for more details and usage example.

or_filter(**kwargs)

Filter the QuerySet to return a subset of the documents, but combine the filters with OR instead of AND. Uses the same syntax and allows for the same filters as filter() with the exception that currently predefined special fields (see only()) are not supported.

order_by(field)

Order results returned according to a specified field. By default, all sorting is case-sensitive and in ascending order.

Parameters:field – the name (a string) of a field in the QuerySet’s model. If the field is prefixed with ‘-‘, results will be sorted in descending order. If the field is prefixed with ‘~’, results will use a case-insensitive sort. The flags ‘-‘ and ‘~’ may be combined in any order.

Example usage:

queryset.filter(fulltext_terms='foo').order_by('-fulltext_score')
queryset.order_by('~name')

This method returns an updated copy of the QuerySet. It does not modify the original.

order_by_raw(xpath, ascending=True)

Order results returned by a raw XPath.

Parameters:xpath – the xpath to be used

This method returns an updated copy of the QuerySet. It does not modify the original.

Example usage:

qs.order_by_raw('min(%(xq_var)s//date/string())')
query_result_type

Custom query result return type used to access a batch of results wrapped in an exist result as returned by the REST API. Extends eulexistdb.db.QueryResult to add an item-level result mapping based, using return_type if appropriate.

reset()

Reset filters and cached results on the QuerySet.

This modifies the current query set, removing all filters, and resetting cached results.

result_id

Return the cached server result id, executing the query first if it has not yet executed.

return_type

Return type that will be used for initializing results returned from eXist queries. Either the subclass of XmlObject passed in to the constructor as model, or, if only() or also() has been used, a dynamically created instance of XmlObject with the xpaths modified based on the constructed xml return.

using(collection)

Specify the eXist collection to be queried.

If you are using an eulexistdb.models.XmlModel to generate queries against an eXist collection other than the one defined in settings.EXISTDB_ROOT_COLLECTION, you should use this function.

class eulexistdb.query.XmlQuery(node=None, context=None, **kwargs)

XmlObject class to allow describing queries in xml.

Django tie-ins for eulexistdb

Custom Template Tags

eulexistdb Management commands

The following management command will be available when you include eulexistdb in your django INSTALLED_APPS and rely on the existdb settings described above.

For more details on these commands, use manage.py <command> help

  • existdb - update, remove, and show information about the index configuration for a collection index; reindex the configured collection based on that index configuration

testutil Unit Test utilities

debug_panel Debug Toolbar Panel