deegree 3 datastore configuration concepts
Status: TMC proposal (2009-08-28)
This page describes relevant technical concepts to be considered in the configuration of datastores in deegree 3.
1. The general concept "datastore"
A basic observation is that the two most important categories of geospatial data are coverages and features and that most OGC services are based on providing access to -- or operations on -- stored datasets of these kinds.
Coverages: A coverage is a function describing the distribution of some set of properties over a spatial-temporal region. Although other descriptions of coverages are possible, the most common way to define a coverage is by providing a regular grid of measurements, i.e. a raster.
Features: A feature is a structured object with named properties and an identifier. Properties may have geometric and non-geometric values and may be (nested) features. The geometries are usually vector-based.
1.1. OGC services and geospatial data
WMS: Provides queryable layers that are either based on coverages or on styled features
WCS: Provides queryable coverage layers.
WFS: Provides (transactional) access to stored features.
Beside these three services, other services use access to stored coverages and features as well:
WPVS: Can make use of coverage data for DEMs (although not as efficient as using pre-calculated multiresolution triangle meshes) + (overlayed) terrain textures. Feature data may be used as well (e.g. CityGML buildings)
WPS: It appears very likely that geospatial processes may often need to access stored coverage and feature data.
It should be noted that the SOS and the CSW don't fit here 100%, as their native object model is neither coverage nor feature based. The TMC currently opts for providing specialized data access mechanisms for these two services, although the possibilities to back the SOS by coverage/feature datastores should still be considered and evaluated. The deegree 3 WPVS uses a specialised data type as well: it uses multi-resolution triangle meshes to provide efficient rendering of high-volume DEMs.
1.2. Datastores
In the following, the term "datastore" is used to refer to stored geospatial data, i.e. coverages or features. For an efficient implementation, it is vital to use datastores as an abstraction layer between data providers (backends) and the deegree 3 OGC web services. This way, the implementation as well as the higher levels of the service configurations do not need to cope with the details of the specific source of the data.
1.2.1. Coverage datastore backends
Coverage datastores can be backed by different storage formats / technologies. Some examples:
- Pyramids of raster tiles (e.g. deegree 2 raster trees), which may be based on multiple formats (JPG, PNG, TIFF, GeoTIFF, ...)
- ECW (internally tiled raster data)
Oracle GeoRaster
- PostGIS Raster
- Remote WMS (source is a layer of a remote WMS)
- Remote WCS (source is a layer of a remote WCS)
- OSM (remote)
GoogleMaps (remote)
- Remote WPS processes that return raster data
- Raster data derived/calculated by Java code
1.2.2. Feature datastore backends
Feature datastores can be backed by different storage formats / technologies. Some examples:
- PostGIS
- Oracle Spatial
- MSSQL Server
- MySQL Spatial
- ESRI Shapefiles
- Arc SDE
- Remote WFS (source is a featuretype/schema of a remote WFS)
- Remote WPS processes that return features
- GML documents (stored on the file system)
2. Configuration of datastores in deegree 3
In order to provide a consistent and reusable way of accessing stored coverage and feature data, it is crucial that all services can use the same code and configuration mechanisms.
2.1. Requirements of different datastores
Some tests have been performed for different kinds of backends and it has been recognized, that the user-friendliness of the configuration can be improved a lot over deegree 2, if the configurations of the different backends only require the information that is specific for this certain type:
- Feature datastore based on shapefiles: For setting up a shape-based datastore quickly, it is usually sufficient to provide the shapefile itself. The feature type and properties can be derived automatically. If a user wishes to customize them however, more information could be provided as a simple XML file.
- Feature datastore based on GML files: To configure these kinds of datastores, it has proved to be sufficient and convenient to simply provide the GML schema and GML instance documents. This allows for a very easy and quick setup for setting up a test WFS for a certain application schema.
- Feature datastore based on SQL backends: For these datastores, the configuration becomes most complex, as they require a (maybe complex) definition of the feature type as well as relational mapping information. It could however be worthwhile to integrate a simple variant (for simple featuretypes) that uses a table name as the feature type name and derives the properties from the columns.
The TMC favors to custom-tailor the configuration process to the different backends, so the setup becomes as easy and hassle-free as possible.
2.2. Local datastore configuration
It should be possible to configure datastores as part of a service configuration, e.g. inside the WMS configuration document or as datastore configuration documents that are located inside or below the WMS configuration directory. These datastores are only relevant for the specific service.
2.3. Global datastore configuration
Besides configuring datastores individually for certain services, a second (global) way is recommended by the TMC as well, which would be used to define datastores that can be referenced in the different service-configurations using a unique identifier. Some use-cases:
- Setting up WFS and WCS instances that provide the same datasets that are accessible by a WMS. Note that such a setup is advocated by the SLD specification for WMS instances that allow the definition of user styles.
- Setting up a WPS (and processes) that operates on the same feature datasets as served by a WFS.
In both cases, duplication of configuration as well as errors/inconsistencies are avoided.
Here's an example for a file based configuration of global datastores. The configuration files would be placed below the datastores folder in the WEB-INF/conf directory of the services webapp and scanned on startup or dynamically.
conf/ `-- datastores |-- coverage | |-- rasterstore1.xml | |-- rasterstore2.xml | `-- rasterstoreN.xml `-- feature |-- featurestore1.xml |-- featurestore2.xml `-- featurestoreN.xml
If the plans for a web-based service configuration are realized in a later phase, the service/datastore configuration process can of course be integrated in a guided and user-friendly way.
2.4. Relation between datastore and service configuration
For a clean technical implementation, it is important to separate the datastore concepts from details that are specific to the configuration of a certain service.
Originally, it has been planned to provide bbox and scale in the configuration of coverage datastores. However, during the development of the WPVS, it turned out that both introduce fundamental problems here and cannot be used as a solid means of switching between different resolutions and datasets. In the case of the WPVS, the WMS-motivated 2D scale constraint does not work, as no single scale exists for a 3D scene. The bounding box constraint also proved to be rather unusable as a dataset/LOD selection means as it has to be deeply integrated with WPVS datastructures in order to work as expected.
In order to make the datastores and configuration safely reusable in different contexts, the TMC advises to avoid these kinds of constraints on the datastore level, but to move these to the service configurations.
3. Summary of TMC recommendations
- Datastores should reflect the general concepts for accessing coverage/feature data in a backend-agnostic manner. Conceptually, datastores should simply be providers of coverage or feature data.
- In order to be usable in different types of services, the configuration of datastores should not contain any details that are specific to a certain service.
- The usage of constraints (e.g. bounding box and scale) in the datastore configuration is discouraged. Constraints should be integrated in the configuration of objects on higher levels (e.g. in the WMS layer configuration).
- Datastore configuration should be based on common XML schemas and mechanisms that can be used throughout all services in an identical manner.
- Besides offering the possibility to embed datastore configurations inside service configuration documents, it should also be possible to configure global datastores that can be referenced by a service configuration (or a WPS process) by a unique identifier.