ziggy
ziggy

Reputation: 15876

Oracle SQL - Selecting a date from an XML tag and casting it to a DATE type to allow comparison of the dates

I have a CLOB type column (called xml) that stores an xml document. The xml document has a tag called <productionDate>.

What i want to do is search the table and retrieve any row that contains an xml document that has productionDate between two dates.

I know how to read the xml tag as a value:

Select 
xmltype(xml).extract('//product/productionDate/text()').getStringVal() from myTable

The above query returns the following date.

1999-09-23 00:00:00.0 UTC
2000-01-18 00:00:00.0 UTC
2000-01-18 00:00:00.0 UTC
1999-11-02 00:00:00.0 UTC
1999-11-02 00:00:00.0 UTC
1999-11-02 00:00:00.0 UTC
1999-11-02 00:00:00.0 UTC
1999-11-02 00:00:00.0 UTC
1999-11-02 00:00:00.0 UTC

I want to only list the ones that are between 01-NOV-1999 and and 01-JAN-2000. To do this i tried to cast the value to a DATE type

Select 
XMLCast(xmltype(xml).extract('//product/productionDate/text()').getStringVal() as DATE) from myTable

I get the following error:

ORA-01830: date format picture ends before converting entire input string

Presumably this is because the format of the date cannot be converted to a date format. What kind i do to read the date as a valid date (Ignoring the time and the text UTC) so that i can do a comparison on the date.

Maybe a substring might work but i want to hear what other more efficient options are available out there.

Upvotes: 0

Views: 6501

Answers (3)

Marco Gralike
Marco Gralike

Reputation: 11

In all, you also could use an XMLTYPE data type, which actually is optimized for handling XML. And it will outperform more alternative solutions than you would think of. No poor man's solutions needed (if not only, also in 8i/9.1/9.2 database versions this would be a bad "solution"). That said, XML(DB) is a "no cost option" that can freely be used / no extra license needed.

Upvotes: 1

Alex Poole
Alex Poole

Reputation: 191275

You're getting the value as a string, so you should just be able to use use to_date or to_timestamp:

Select 
  to_timestamp(xmltype(xml).extract('//product/productionDate/text()').getStringVal(),
    'YYYY-MM-DD HH24:MI:SS.FF "UTC"')
from myTable

SQL Fiddle demo.

You could move it to a different time zone at the same time if you needed to.

Of course, given the date format you have, you could compare as strings and skip the conversion...

Upvotes: 2

Richard Pascual
Richard Pascual

Reputation: 2021

Poor-Man's Method for Optimizing Queries for File-Based Data Types

There are a number of ways to optimize storage and retrieval of file-based data types (CLOB/XML). If you're anchored to a relational database solution such as Oracle, here's some simple suggestions that use out-of-the-box functionality without having to buy, license or install additional add-ons and products.

Initial Thoughts: Thinking About Traffic and Usage

It may help you to consider how often you need to dive into the XML files to extract their properties. It would also be important to consider how often the contents of these XML files change once they have been added to your CLOB-based table.

If the DML traffic involving your CLOB typed table isn't too heavy, consider building an assisting object which has the types of hooks that will help you find the right XML files when you need it. Once you have it, you can divert query activity away from your source table except for specific Primary Key based inquiries.

Inline Function Conversions and Their Optimization

That being said, you could apply conversion functions on the fly and transform the date value every time you query it. There are even add-ons for Oracle that optimizes the database for querying and searching for values within LOB objects....

There is even an optimization convention called a FUNCTION BASED INDEX, which prompts Oracle to use an alternate reference (index) that also uses the TO_DATE/SUBSTR function combinations you would need to convert the value in your XML document.

There are alternate methods to INLINE conversions, which are probably a lot better because an INLINE conversion is applied every time the query is called. In general, no conversion functions and data in their native format (i.e., date as date, datetime or timestamp) run much faster and with less cost to existing database resources.

Externally Tagging Your XML Documents

Assuming that the CLOB table with your XML files also has a primary key column, two ideas come to mind:

  1. Create a second table with the converted values (such as PRODUCTION_DATE), identify each date record with the PK ID of the CLOB that it came from. This table can be extended as you discover new attributes that are frequently accessed by your queries. Manage the second table by placing a trigger on the CLOB table.

    Anytime a clob record is added, the data value is extracted and converted exactly once. If anything queries for a clob record that has not changed, there should be no need to extract and convert the value since the last time it was queried.

  2. Create a Materialized View containing the Extraction and Conversion function so that the MView column for PRODUCTION_DATE can be defined as a real DATE type. MViews also work like Idea (1), but a little more elegantly.

    If it's possible a FAST refreshable view would work well as it manages your MView automatically and in near real-time response to changes in your CLOB table. A FAST refreshable view only updates the MView with changes and additions, so even if your source table is huge, daily operations are based on incremental effects only.

Not knowing the volume, usage or stats on your data a COMPLETE refresh type MView may or may not be possible or efficient without additional compromises or assumptions. Sometimes MView query definitions are too complex to qualify for the FAST refreshable format.

In either case, you end up with a second object (MVIEW or TABLE) that contains a DATE formatted value for PRODUCTION_DATE. Query this TAG table instead and use the associated PK value to identify which record in your CLOB table should be accessed once you've narrowed down the set or individual record that meets the query criteria.

Parting Thoughts (Optional... sort of)

If it's possible to have a NULL PRODUCTION_DATE value, it would be tempting to simply apply a NVL() inline function when querying the TAG supporting table. Depending on the volume of data in question, this probably shouldn't be a problem. Ideally, you'll want to always have a value in there... and a NOT NULL constraint on that DATE column... those kinds of things help the db optimizer make better assumptions about how to dive into the tables... especially if there are lots and lots of records in there.

Upvotes: 1

Related Questions