existdb
– Store and retrieve data in an eXist database¶
Interact with eXist-db XML databases.
This package provides classes to ease interaction with eXist XML databases. It contains the following modules:
eulexistdb.db
– Connect to the database and queryeulexistdb.query
– QueryXmlObject
models from eXist with semantics like a Django QuerySet
Direct database access¶
Connect to an eXist XML database and query it.
This module provides ExistDB
and related classes for connecting to
an eXist-db database and executing XQuery queries against it.
When used with Django, ExistDB
can pull
configuration settings directly from Django settings. If you create
an instance of ExistDB
without specifying a
server url, it will attempt to configure an eXist database based on
Django settings, using the configuration names documented below.
Projects that use this module should include the following settings in their
settings.py
:
#Exist DB Settings
EXISTDB_SERVER_USER = 'user'
EXISTDB_SERVER_PASSWORD = 'password'
EXISTDB_SERVER_URL = "http://megaserver.example.com:8042/exist"
EXISTDB_ROOT_COLLECTION = "/sample_collection"
To configure a timeout for most eXist connections, specify the desired
time in seconds as EXISTDB_TIMEOUT
; if none is specified, the
global default socket timeout will be used.
Note
Any configured EXISTDB_TIMEOUT
will be ignored by the
existdb management command, since reindexing a large collection
could take significantly longer than a normal timeout would allow
for.
If you are using an eXist index configuration file, you can add another setting to specify your configuration file:
EXISTDB_INDEX_CONFIGFILE = "/path/to/my/exist_index.xconf"
This will allow you to use the existdb
management command to
manage your index configuration file in eXist.
If you wish to specify options for fulltext queries, you can set a dictionary of options like this:
EXISTDB_FULLTEXT_OPTIONS = {'default-operator': 'and'}
Note
Python xmlrpclib
does not support extended types, some of which
are used in eXist returns. This does not currently affect the
functionality exposed within ExistDB
, but may cause issues
if you use the ExistDB.server
XML-RPC connection directly
for other available eXist XML-RPC methods. If you do make use of
those, you may want to enable XML-RPC patching to handle the return
types:
from eulexistdb import patch
patch.request_patching(patch.XMLRpcLibPatch)
—
If you are writing unit tests against code that uses
eulexistdb
, you may want to take advantage of
eulexistdb.testutil.TestCase
for loading fixture data to a
test eXist-db collection, and
eulexistdb.testutil.ExistDBTestSuiteRunner
, which has logic
to set up and switch configurations between a development and test
collections in eXist.
-
class
eulexistdb.db.
ExistDB
(server_url[, resultType[, encoding[, verbose]]])¶ Connect to an eXist database, and manipulate and query it.
Construction doesn’t initiate server communication, only store information about where the server is, to be used in later communications.
Parameters: - server_url – The eXist server URL. New syntax (as of 0.20)
expects primary eXist url and not the
/xmlrpc
endpoint; for backwards compatibility, urls that include /xmlrpc` are still handled, and will be parsed to set exist server path as well as username and password if specified. Note that username and password parameters take precedence over username and password in the server url if both are specified. - username – exist username, if any
- password – exist user password, if any
- resultType – The class to use for returning
query()
results; defaults toQueryResult
- encoding – The encoding used to communicate with the server; defaults to “UTF-8”
- verbose – When True, print XML-RPC debugging messages to stdout
- timeout – Specify a timeout for xmlrpc connection requests. If not specified, the global default socket timeout value will be used.
- keep_alive – Optional parameter, to disable requests built-in session handling; can also be configured in django settings with EXISTDB_SESSION_KEEP_ALIVE
-
getDocument
(name)¶ Retrieve a document from the database.
Parameters: name – database document path to retrieve Return type: string contents of the document
-
createCollection
(collection_name[, overwrite])¶ Create a new collection in the database.
Parameters: - collection_name – string name of collection
- overwrite – overwrite existing document?
Return type: boolean indicating success
-
removeCollection
(collection_name)¶ Remove the named collection from the database.
Parameters: collection_name – string name of collection Return type: boolean indicating success
-
hasCollection
(collection_name)¶ Check if a collection exists.
Parameters: collection_name – string name of collection Return type: boolean
-
load
(xml, path[, overwrite])¶ Insert or overwrite a document in the database.
Parameters: - xml – string or file object with the document contents
- path – destination location in the database
Return type: boolean indicating success
-
query
(xquery[, start[, how_many]])¶ Execute an XQuery query, returning the results directly.
Parameters: - xquery – a string XQuery query
- start – first index to return (1-based)
- how_many – maximum number of items to return
- cache – boolean, to cache a query and return a session id (optional)
- session – session id, to retrieve a cached session (optional)
- release – session id to be released (optional)
Return type: the resultType specified at the creation of this ExistDB; defaults to
QueryResult
.
-
executeQuery
(xquery)¶ Execute an XQuery query, returning a server-provided result handle.
Parameters: xquery – a string XQuery query Return type: an integer handle identifying the query result for future calls
-
querySummary
(result_id)¶ Retrieve results summary from a past query.
Parameters: result_id – an integer handle returned by executeQuery()
Return type: a dict describing the results The returned dict has four fields:
queryTime: processing time in milliseconds
hits: number of hits in the result set
documents: a list of lists. Each identifies a document and takes the form [doc_id, doc_name, hits], where:
- doc_id: an internal integer identifier for the document
- doc_name: the name of the document as a string
- hits: the number of hits within that document
- doctype: a list of lists. Each contains a doctype public
identifier and the number of hits found for this doctype.
-
getHits
(result_id)¶ Get the number of hits in a query result.
Parameters: result_id – an integer handle returned by executeQuery()
Return type: integer representing the number of hits
-
retrieve
(result_id, position)¶ Retrieve a single result fragment.
Parameters: - result_id – an integer handle returned by
executeQuery()
- position – the result index to return
- highlight – enable search term highlighting in result; optional, defaults to False
Return type: the query result item as a string
- result_id – an integer handle returned by
-
releaseQueryResult
(result_id)¶ Release a result set handle in the server.
Parameters: result_id – an integer handle returned by executeQuery()
-
createCollection
(collection_name, overwrite=False) Create a new collection in the database.
Parameters: - collection_name – string name of collection
- overwrite – overwrite existing document?
Return type: boolean indicating success
-
create_account
(username, password, groups)¶ Create a user account; returns true if the user was created, false if the user already exists. Any other exist exception is re-raised.
-
create_group
(group)¶ Create a group; returns true if the group was created, false if the group already exists. Any other exist exception is re-raised.
-
describeDocument
(*args, **kwargs)¶ Return information about a document in eXist. Includes name, owner, group, created date, permissions, mime-type, type, content-length. Returns an empty dictionary if document is not found.
Parameters: document_path – string full path to document in eXist Return type: dictionary
-
executeQuery
(*args, **kwargs) Execute an XQuery query, returning a server-provided result handle.
Parameters: xquery – a string XQuery query Return type: an integer handle identifying the query result for future calls
-
getCollectionDescription
(*args, **kwargs)¶ Retrieve information about a collection.
Parameters: collection_name – string name of collection Return type: boolean
-
getDoc
(name)¶ Alias for
getDocument()
.
-
getDocument
(name) Retrieve a document from the database.
Parameters: name – database document path to retrieve Return type: string contents of the document
-
getHits
(*args, **kwargs) Get the number of hits in a query result.
Parameters: result_id – an integer handle returned by executeQuery()
Return type: integer representing the number of hits
-
getPermissions
(*args, **kwargs)¶ Retrieve permissions for a resource in eXist.
Parameters: resource – full path to a collection or document in eXist Return type: ExistPermissions
-
hasCollection
(collection_name) Check if a collection exists.
Parameters: collection_name – string name of collection Return type: boolean
-
hasCollectionIndex
(collection_name)¶ Check if the specified collection has an index configuration in eXist.
Note: according to eXist documentation, index config file does not have to be named collection.xconf for reasons of backward compatibility. This function assumes that the recommended naming conventions are followed.
Parameters: collection – name of the collection with an index to be removed Return type: boolean indicating collection index is present
-
hasDocument
(*args, **kwargs)¶ Check if a document is present in eXist.
Parameters: document_path – string full path to document in eXist Return type: boolean
-
load
(xml, path) Insert or overwrite a document in the database.
Parameters: - xml – string or file object with the document contents
- path – destination location in the database
Return type: boolean indicating success
-
loadCollectionIndex
(collection_name, index)¶ Load an index configuration for the specified collection. Creates the eXist system config collection if it is not already there, and loads the specified index config file, as per eXist collection and index naming conventions.
Parameters: - collection_name – name of the collection to be indexed
- index – string or file object with the document contents (as used by
load()
)
Return type: boolean indicating success
-
moveDocument
(*args, **kwargs)¶ Move a document in eXist from one collection to another.
Parameters: - from_collection – collection where the document currently exists
- to_collection – collection where the document should be moved
- document – name of the document in eXist
Return type: boolean
-
query
(*args, **kwargs) Execute an XQuery query, returning the results directly.
Parameters: - xquery – a string XQuery query
- start – first index to return (1-based)
- how_many – maximum number of items to return
- cache – boolean, to cache a query and return a session id (optional)
- session – session id, to retrieve a cached session (optional)
- release – session id to be released (optional)
Return type: the resultType specified at the creation of this ExistDB; defaults to
QueryResult
.
-
querySummary
(*args, **kwargs) Retrieve results summary from a past query.
Parameters: result_id – an integer handle returned by executeQuery()
Return type: a dict describing the results The returned dict has four fields:
queryTime: processing time in milliseconds
hits: number of hits in the result set
documents: a list of lists. Each identifies a document and takes the form [doc_id, doc_name, hits], where:
- doc_id: an internal integer identifier for the document
- doc_name: the name of the document as a string
- hits: the number of hits within that document
- doctype: a list of lists. Each contains a doctype public
identifier and the number of hits found for this doctype.
-
reindexCollection
(collection_name)¶ Reindex a collection. Reindex will fail if the eXist user does not have the correct permissions within eXist (must be a member of the DBA group).
Parameters: collection_name – string name of collection Return type: boolean success
-
releaseQueryResult
(*args, **kwargs) Release a result set handle in the server.
Parameters: result_id – an integer handle returned by executeQuery()
-
removeCollection
(*args, **kwargs) Remove the named collection from the database.
Parameters: collection_name – string name of collection Return type: boolean indicating success
-
removeCollectionIndex
(collection_name)¶ Remove index configuration for the specified collection. If index collection has no documents or subcollections after the index file is removed, the configuration collection will also be removed.
Parameters: collection – name of the collection with an index to be removed Return type: boolean indicating success
-
removeDocument
(*args, **kwargs)¶ Remove a document from the database.
Parameters: name – full eXist path to the database document to be removed Return type: boolean indicating success
-
retrieve
(*args, **kwargs) Retrieve a single result fragment.
Parameters: - result_id – an integer handle returned by
executeQuery()
- position – the result index to return
- highlight – enable search term highlighting in result; optional, defaults to False
Return type: the query result item as a string
- result_id – an integer handle returned by
-
setPermissions
(*args, **kwargs)¶ Set permissions on a resource in eXist.
Parameters: - resource – full path to a collection or document in eXist
- permissions – int or string permissions statement
- server_url – The eXist server URL. New syntax (as of 0.20)
expects primary eXist url and not the
-
class
eulexistdb.db.
QueryResult
(node=None, context=None, **kwargs)¶ The results of an eXist XQuery query
-
count
¶ The number of results returned in this chunk
-
hits
¶ The total number of hits found by the search
-
-
exception
eulexistdb.db.
ExistDBException
¶ A handy wrapper for all errors returned by the eXist server.
Object-based searching¶
Provide a prettier, more Pythonic approach to eXist-db access.
This module provides QuerySet
modeled after Django QuerySet
objects. It’s not dependent on Django at all, but it aims to function as a
stand-in replacement in any context that expects one.
-
class
eulexistdb.query.
QuerySet
(model=None, xpath=None, using=None, collection=None, xquery=None, fulltext_options={})¶ Lazy eXist database lookup for a set of objects.
Parameters: - model – the type of object to return from
__getitem__()
. If set, the resulting xml nodes will be wrapped in objects of this type. Some methods, likefilter()
andonly()
only make sense if this is set. While this argument can be any callable object, it is typically a subclass ofXmlObject
. - xpath – an XPath expression where this QuerySet will begin
filtering. Typically this is left out, beginning with an
unfiltered collection: Filtering is then added with
filter()
. - using – The
ExistDB
to query against. - collection – If set, search only within a particular eXist-db collection. Otherwise search all collections.
- xquery – Override the entire
Xquery
object used for internal query serialization. Most code will leave this unset, which uses a defaultXquery
. - fulltext_options – optional dictionary of fulltext options to be used as settings for any full-text queries. See http://demo.exist-db.org/lucene.xml#N1047C for available options. Requires a version of eXist that supports this feature.
-
all
()¶ Return all results.
This method returns an identical copy of the QuerySet.
-
also
(*fields)¶ Return additional data in results.
Parameters: fields – names of fields in the QuerySet’s model
This method returns an updated copy of the QuerySet: It does not modify the original. When results are returned from the updated copy, they will contain the specified additional fields.
For special fields available, see
only()
.For performance considerations, see note on
only()
.
-
also_raw
(**fields)¶ Return an additional field by raw xpath. Similar to (and can be combined with)
also()
, but xpath is not pulled from the model. Use this when you want to retrieve a field with a different xpath than the one configured in your model. SeeXquery.return_only()
for details on specifying xpaths in raw mode.Parameters: - fields – field name and xpath in keyword-args notation. If field is the name of a field on the associated model, the result of the raw xpath should be accessible on the return object as the normal property.
- xpath – xpath for retrieving the specified field
Can be combined with
also()
.Example usage:
qs.also_raw(field_matches='count(util:expand(%(xq_var)s//field)//exist:match)')
-
count
()¶ Return the cached query hit count, executing the query first if it has not yet executed.
-
distinct
()¶ Return distinct results.
This method returns an updated copy of the QuerySet: It does not modify the original. When results are returned from the updated copy, they will contain only distinct results.
-
exclude
(**kwargs)¶ Filter the QuerySet to return a subset of the documents that do not contain any of the filters. Uses the same syntax and allows for the same filters as
filter()
.
-
filter
(combine='AND', **kwargs)¶ Filter the QuerySet to return a subset of the documents.
Arguments take the form
lookuptype
orfield__lookuptype
, wherefield
is the name of a field in the QuerySet’smodel
andlookuptype
is one of:exact
– The field or object matches the argument value.contains
– The field or object contains the argument value.startswith
– The field or object starts with the argument value.fulltext_terms
– the field or object contains any of the the argument terms anywhere in the full text; requires a properly configured lucene index. By default, highlighting is enabled when this filter is used. To turn it off, specify an additional filter of highlight=False. Recommend usingfulltext_score
for ordering, in return fields.highlight
- highlight search terms; when used withfulltext_terms
, should be specified as a boolean (enabled by default); when used separately, takes a string using the same search format asfulltext_terms
, but content will be returned even if it does not include the search terms. Requires a properly configured lucene index.in
- field or object is present in a list of valuesexists
- field or object is or is not present in the document;if True, field must be present; if False, must not be present.
document_path
- restrict the query to a single document; this must be a document path as returned by eXist, with full db pathgt
,gte
,lt
,lte
- greater than, greater than or equal,less than, less than or equal
Field may be in the format of field__subfield when field is an NodeField or NodeListField and subfield is a configured element on that object.
Field may also be one of the prefined ‘special’ fields; see
only()
for the list of fields.Any number of these filter arguments may be passed. This method returns an updated copy of the QuerySet: It does not modify the original.
Parameters: combine – optional; specify how the filters should be combined. Defaults to AND
; also supportsOR
.
-
get
(**kwargs)¶ Get a single result identified by filter arguments.
Takes any number of
filter()
arguments. Unlikefilter()
, though, this method returns exactly one item. If the filter expressions match no items, or if they match more than one, this method throws an exception.Raises a
eulexistdb.exceptions.DoesNotExist
exception if no matches are found; raises aeulexistdb.exceptions.ReturnedMultiple
exception if more than one match is found.
-
getDocument
(docname)¶ Get a single document from the server by filename.
-
only
(*fields)¶ Limit results to include only specified fields.
Parameters: fields – names of fields in the QuerySet’s model
This method returns an updated copy of the QuerySet: It does not modify the original. When results are returned from the updated copy, they will contain only the specified fields.
- Special fields available:
fulltext_score
- lucene query; should only be used when a fulltext query has been useddocument_name
,collection_name
- document or collection name where xml content is stored in eXisthash
- generate and return a SHA-1 checksum of the root element being queriedlast_modified
-DateTimeField
for the date the document the xml element belongs to was last modified
NOTE: Be aware that this will result in an XQuery with a constructed return. For large queries, this may have a significant impact on performance. For more details, see http://exist.sourceforge.net/tuning.html#N103A2 .
-
only_raw
(**fields)¶ Limit results to include only specified fields, and return the specified field by xpath. Similar to (and can be combined with)
only()
. SeeXquery.return_only()
for details on specifying xpaths in raw mode.See
also_raw()
for more details and usage example.
-
or_filter
(**kwargs)¶ Filter the QuerySet to return a subset of the documents, but combine the filters with OR instead of AND. Uses the same syntax and allows for the same filters as
filter()
with the exception that currently predefined special fields (seeonly()
) are not supported.
-
order_by
(field)¶ Order results returned according to a specified field. By default, all sorting is case-sensitive and in ascending order.
Parameters: field – the name (a string) of a field in the QuerySet’s model
. If the field is prefixed with ‘-‘, results will be sorted in descending order. If the field is prefixed with ‘~’, results will use a case-insensitive sort. The flags ‘-‘ and ‘~’ may be combined in any order.Example usage:
queryset.filter(fulltext_terms='foo').order_by('-fulltext_score') queryset.order_by('~name')
This method returns an updated copy of the QuerySet. It does not modify the original.
-
order_by_raw
(xpath, ascending=True)¶ Order results returned by a raw XPath.
Parameters: xpath – the xpath to be used This method returns an updated copy of the QuerySet. It does not modify the original.
Example usage:
qs.order_by_raw('min(%(xq_var)s//date/string())')
-
query_result_type
¶ Custom query result return type used to access a batch of results wrapped in an exist result as returned by the REST API. Extends
eulexistdb.db.QueryResult
to add an item-level result mapping based, usingreturn_type
if appropriate.
-
reset
()¶ Reset filters and cached results on the QuerySet.
This modifies the current query set, removing all filters, and resetting cached results.
-
result_id
¶ Return the cached server result id, executing the query first if it has not yet executed.
-
return_type
¶ Return type that will be used for initializing results returned from eXist queries. Either the subclass of
XmlObject
passed in to the constructor as model, or, ifonly()
oralso()
has been used, a dynamically created instance ofXmlObject
with the xpaths modified based on the constructed xml return.
-
using
(collection)¶ Specify the eXist collection to be queried.
If you are using an
eulexistdb.models.XmlModel
to generate queries against an eXist collection other than the one defined insettings.EXISTDB_ROOT_COLLECTION
, you should use this function.
- model – the type of object to return from
-
class
eulexistdb.query.
XmlQuery
(node=None, context=None, **kwargs)¶ XmlObject
class to allow describing queries in xml.
Django tie-ins for eulexistdb
¶
Custom Template Tags¶
eulexistdb
Management commands¶
The following management command will be available when you include
eulexistdb
in your django INSTALLED_APPS
and rely on the
existdb settings described above.
For more details on these commands, use manage.py <command> help
- existdb - update, remove, and show information about the index configuration for a collection index; reindex the configured collection based on that index configuration