Oracle selecting all values from a CSV line into custom type

Question

I have an Oracle stored procedure where we have to parse an oddly formatted flat file line by line, due to the fact that it has varying amounts of columns that are dictated by different header structures. So, some example data from the file might look like this (and yes, those headers are inline in the file):

ID  TYPE DATE                VALUE1   VALUE2
1   A    3-1-2021 12:00:00   10       24
1   A    3-2-2021 12:00:00   25       25
1   A    3-3-2021 12:00:00   31       36
ID  TYPE DATE                VALUE1   UNNECESSARY_VALUE VALUE2
2   B    3-1-2021 12:00:00   10       24                14
2   B    3-2-2021 12:00:00   25       25                22
2   B    3-3-2021 12:00:00   31       36                12
ID  TYPE DATE                VALUE1   VALUE2
3   A    3-1-2021 12:00:00   10       24
3   A    3-2-2021 12:00:00   25       25
3   A    3-3-2021 12:00:00   31       36
ID  TYPE DATE                VALUE1   UNNECESSARY_VALUE1 UNNECESSARY_VALUE2 VALUE2
4   C    3-1-2021 12:00:00   10       21                 22                 38
4   C    3-2-2021 12:00:00   25       21                 42                 23
4   C    3-3-2021 12:00:00   31       22                 32                 34

We are getting this from an external entity that has simply refused to make this an easier file to ingest. What I'm doing right now is getting a line, determining if it's a header or not and tracking what is present in each line with some variables. The TYPE column will usually also coincide with the number of columns, telling us which ones we need to get. I'm also clearing the white space and making the line comma separated, as the amount of space isn't necessarily constant.

I'm using REGEXP_SUBSTR to separate each line and save several variables. Consequently, that requires several calls to REGEXP_SUBSTR. So, I have several calls to get different values like so:

--v_val_index and v_val2_index are based on header row positioning
v_id   := REGEXP_SUBSTR(v_line, '(.*?,){'||(0)||'}([^,]*)', 1, 1, '', 2);
v_type := REGEXP_SUBSTR(v_line, '(.*?,){'||(1)||'}([^,]*)', 1, 1, '', 2);
v_date := REGEXP_SUBSTR(v_line, '(.*?,){'||(2)||'}([^,]*)', 1, 1, '', 2);
v_val1 := REGEXP_SUBSTR(v_line, '(.*?,){'||(v_val1_index)||'}([^,]*)', 1, 1, '', 2);
v_val2 := REGEXP_SUBSTR(v_line, '(.*?,){'||(v_val2_index)||'}([^,]*)', 1, 1, '', 2);

The function parses a 100 MB file in about 30 seconds, so execution time isn't terrible but I want to verify if there is room for improvement. Therefore, my question is two part. Instead of using REGEXP_SUBSTR, is there a way to break up the line into a custom type and then reference the pieces of the line by index? If so, will that make a noticeable performance improvement?

Oracle selecting all values from a CSV line into custom type

Answers (1)

Related Questions