The6thSense
The6thSense

Reputation: 8335

Postgresql - XML- Processing xpath with date comparission

I have ingested XML file in Postgresql DB and I trying to perform XPATH operation on the field with date comparison but it is not working. From Postgresql official site I found out that Postgresql uses Xpath1 and Xpath1 does not support date comparison. I tried changing my Xpath query to use string based comparison but still it is not matching.

Input Xpath:

//val:key[@modifiedDate>"2024-06-06"]

Since Direct date comparison is not available in Xpath1 I tried String based comparison.

Converted Xpath:

//val:key[substring(@modifiedDate, 1, 10) > "2024-06-06"]

But this XPath is also not matching even though there is a matching Xpath. How should I frame my Xpath to match the field with date compaission.

The sample structure of XML ingested in table.

<root xmlns:val="http://some.uri">
    <val:key modifiedDate="2024-06-07T12:34:56">Key1</val:key>
    <val:key modifiedDate="2024-06-05T08:21:00">Key2</val:key>
    <val:key modifiedDate="2024-06-10T14:22:33">Key3</val:key>
    <val:key modifiedDate="2024-06-06T09:00:00">Key4</val:key>
</root>

The expected result is we should be able to match this XML becuase it xpaths <val:key modifiedDate="2024-06-07T12:34:56">Key1</val:key> and <val:key modifiedDate="2024-06-10T14:22:33">Key3</val:key>, as their modifiedDate attributes start with dates later than "2024-06-06".

SQL Query for reproducing the problem:

CREATE TABLE xml_data (
    id SERIAL PRIMARY KEY,
    document XML
);

INSERT INTO xml_data (document)
VALUES ('<root xmlns:val="http://some.uri">
            <val:key modifiedDate="2024-06-07T12:34:56">Key1</val:key>
            <val:key modifiedDate="2024-06-05T08:21:00">Key2</val:key>
            <val:key modifiedDate="2024-06-10T14:22:33">Key3</val:key>
            <val:key modifiedDate="2024-06-06T09:00:00">Key4</val:key>
        </root>');

Query to filter data:

SELECT 
    id, document
FROM 
    xml_data
WHERE 
    xpath_exists('//val:key[substring(@modifiedDate, 1, 10) > "2024-06-06"]', document);

Upvotes: 0

Views: 102

Answers (2)

Conal Tuohy
Conal Tuohy

Reputation: 3215

As Pepe N O points out, using the > operator in your XPath expression will convert the date operands from strings into numbers, but since both operands contain non-numeric characters, the conversion will produce NaN in both cases.

However, you could modify your XPath expression to use the XPath translate function to convert your dates into purely numerical values.

For example the expression translate('2024-06-07T12:34:56', '-T:', '') equals "20240607123456".

Upvotes: 1

Pepe N O
Pepe N O

Reputation: 2344

Considering that

XPath 1.0 has no ordering comparison operator for strings. Both "cat" < "dog" and "cat" > "dog" are false, because each is a numeric comparison of two NaNs.

You could access the values on your XML document and cast to date or timestamp and then make the comparison, this way (fixing a missing namespace xmlns:val="http://some.uri" to make it valid):

select key, modified_date from
xmltable(xmlnamespaces('http://some.uri' as val),
'//val:key'
passing (select document from xml_data) 
columns key varchar path 'text()',  
modified_date timestamp path '@modifiedDate')
where modified_date>'2024-06-06';

Resulting

key modified_date
Key1 2024-06-07 12:34:56.000
Key3 2024-06-10 14:22:33.000
Key4 2024-06-06 09:00:00.000

Fiddle to test

Upvotes: 1

Related Questions