Initial select strategy for SQL based datastores
By default, the initial SELECT statement uses LEFT OUTER JOINs in order to build table chains. LEFT OUTER JOINs are considered necessary to apply restrictions on properties that are stored in a related table (and not the root table) while keeping close to the semantic of filter expressions.
Consider the following example query:
1 <wfs:Query typeName="app:Philosopher">
2 <ogc:Filter>
3 <ogc:Or>
4 <ogc:PropertyIsEqualTo>
5 <ogc:PropertyName>app:placeOfBirth/app:Place/app:country/app:Country/app:name</ogc:PropertyName>
6 <ogc:Literal>France</ogc:Literal>
7 </ogc:PropertyIsEqualTo>
8 <ogc:PropertyIsEqualTo>
9 <ogc:PropertyName>app:placeOfBirth/app:Place/app:name</ogc:PropertyName>
10 <ogc:Literal>Mondovi</ogc:Literal>
11 </ogc:PropertyIsEqualTo>
12 </ogc:Or>
13 </ogc:Filter>
14 </wfs:Query>
The filter matches all philosophers who where born in a place in France OR that where born in Mondovi. The current implementation builds the following table chain (it gets the necessary mapping information from the annotated schema):
PHILOSOPHER -> PLACE -> COUNTRY
The corresponding SQL fragment looks like this (most selected columns omitted for brevity):
SELECT X1.NAME FROM PHILOSOPHER X1 LEFT OUTER JOIN PLACE X2 ON X2.ID=X1.PLACE_OF_BIRTH LEFT OUTER JOIN COUNTRY X3 ON X3.ID=X2.COUNTRY_ID WHERE (X3.NAME = 'France') OR (X2.NAME = 'Mondovi')
The SELECT matches the expected rows of the PHILOSOPHER table:
- Voltaire (born in France)
- Simone de Beauvoir (born in France)
- Jean-Paul Sartre (born in France)
- Albert Camus (born in Mondovi)
However, if we use an INNER JOIN instead:
SELECT X1.NAME FROM PHILOSOPHER X1 INNER JOIN PLACE X2 ON X2.ID=X1.PLACE_OF_BIRTH INNER JOIN COUNTRY X3 ON X3.ID=X2.COUNTRY_ID WHERE (X3.NAME = 'France') OR (X2.NAME = 'Mondovi')
Albert Camus is missing from the result set.
How can we improve this? When are OUTER JOINs really needed (most times INNER JOINs work well)?