Ravi Kumar
Ravi Kumar

Reputation: 1923

Create external table from nested directory in oracle

How to create external table from nested directory (directory name will be the column value ) one directory may have multiple sub directory one simple example given below

Expected Table

Country  |  India | Karnataka | Bengaluru |…content of data.txt  
Country  |  India | Maharashtra| Mumbai|…content of data1.txt

Upvotes: 2

Views: 657

Answers (1)

Roberto Hernandez
Roberto Hernandez

Reputation: 8518

This is an interesting issue I came across in one of my projects. So, I am going to show basically what I did and all components you need.

Components

  1. Shell Script to run the fill up of the external data into a csv file
  2. Database Objects ( Table External and Final Table )
  3. Clob function and Blob function to treat the external content of the files
  4. Dynamic number of files in final directory implies one row per file

Scenario

Let's imagine I have the following structure

ls -R | grep ":$" | sed -e 's/:$//' -e 's/[^-][^\/]*\//--/g' -e 's/^/   /' -e 's/-/|/'
   .
   |-India
   |---Karnataka
   |-----Bengaluru
   |-Spain
   |---Catalunya
   |-----Barcelona
$ ls -R
.:
India  Spain

./India:
Karnataka

./India/Karnataka:
Bengaluru

./India/Karnataka/Bengaluru:
data.txt

./Spain:
Catalunya

./Spain/Catalunya:
Barcelona

./Spain/Catalunya/Barcelona:
data1.txt  data2.txt

Following your logic I need an external table as follows

Country|India|Karnataka|Bengaluru|Filename|content of data.txt
Country|Spain|Catalonia|Barcelona|Filename|content of data1.txt
Country|Spain|Catalonia|Barcelona|Filename|content of data2.txt 

Contents of the files

pwd
/test/India/Karnataka/Bengaluru
$ cat data.txt
This is an example of text for this city
$ cd ../../../Spain/Catalunya/Barcelona/
$ cat data1.txt
Another example for Catalonia and Barcelona
$ cat data2.txt
Second example for Catalonia and Barcelona

So, our external table would be

CREATE TABLE EXTERNAL_DATA (
  TYPE              VARCHAR2(4000),
  NAME              VARCHAR2(4000),
  REGION            VARCHAR2(4000),
  CITY              VARCHAR2(4000),
  FILENAME          VARCHAR2(4000)
)
ORGANIZATION EXTERNAL (
  TYPE ORACLE_LOADER
  DEFAULT DIRECTORY dir_util_db
  ACCESS PARAMETERS (
    RECORDS DELIMITED BY newline
    FIELD NAMES ALL FILES IGNORE
    BADFILE dir_util_db:'ext_tab_inp.bad'
    DISCARDFILE dir_util_db:'ext_tab_inp.dsc'
    LOGFILE dir_util_db:'ext_tab_inp.log' 
    FIELDS TERMINATED BY '|'
    NOTRIM
    MISSING FIELD VALUES ARE NULL
    REJECT ROWS WITH ALL NULL 
    FIELDS
    (
       TYPE              CHAR(4000),
       NAME              CHAR(4000),
       REGION            CHAR(4000),
       CITY              CHAR(4000),
       FILENAME          CHAR(4000)
    )
  )
  LOCATION ('ext_data_test.csv')
)
REJECT LIMIT UNLIMITED;

Our definitive table will include the blob column.

CREATE TABLE FINAL_DATA (
      TYPE              VARCHAR2(10),
      NAME              VARCHAR2(20),
      REGION            VARCHAR2(20),
      CITY              VARCHAR2(20),
      FILENAME          VARCHAR2(20),
      DATA              BLOB
    )

We will need some functions to retrieve the blob file from the filesystem and store its content as blob data. We will also need a function to convert the blob content as clob text.

Step by step

Number of records for the external table

$ find . -name "*.txt" | wc -l
3

Fill up the csv file

$ echo "TYPE|NAME|REGION|CITY|FILENAME" > ext_tab_test.csv
$ for i in $(find . -name "*.txt" -type f -print)
> do
> var=$(echo $i)
> record_type="Country"
> record_name=$(echo $var | awk -F '/' '{print $2}')
> record_region=$(echo $var | awk -F '/' '{print $3}')
> record_city=$(echo $var | awk -F '/' '{print $4}')
> record_file=$(echo $var | awk -F '/' '{print $NF}')
> echo "$record_type|$record_name|$record_region|$record_city|$record_file" >> ext_tab_test.csv
> done

$ cat ext_tab_test.csv
TYPE|NAME|REGION|CITY|FILENAME
Country|India|Karnataka|Bengaluru|data.txt
Country|Spain|Catalunya|Barcelona|data1.txt
Country|Spain|Catalunya|Barcelona|data2.txt

Let's go to the database

Create the directory where the external table is located

SQL> create or replace directory dir_test as '/test' ;

Directory created.

Create the external table

SQL>  CREATE TABLE EXTERNAL_DATA (
      TYPE              VARCHAR2(4000),
      NAME              VARCHAR2(4000),
      REGION            VARCHAR2(4000),
      CITY              VARCHAR2(4000),
      FILENAME          VARCHAR2(4000)
    )
    ORGANIZATION EXTERNAL (
    TYPE ORACLE_LOADER
      DEFAULT DIRECTORY dir_test
      ACCESS PARAMETERS (
        RECORDS DELIMITED BY newline
        FIELD NAMES ALL FILES IGNORE
         BADFILE dir_test:'ext_tab_test.bad'
         DISCARDFILE dir_test:'ext_tab_test.dsc'
         LOGFILE dir_test:'ext_tab_test.log'
         FIELDS TERMINATED BY '|'
         NOTRIM
        MISSING FIELD VALUES ARE NULL
        REJECT ROWS WITH ALL NULL
        FIELDS
        (
          TYPE              CHAR(4000),
          NAME              CHAR(4000),
          REGION            CHAR(4000),
          CITY              CHAR(4000),
          FILENAME          CHAR(4000)
        )
      )
      LOCATION ('ext_tab_test.csv')
      )
      REJECT LIMIT UNLIMITED;

Table created.

SQL> col type for a40
SQL> col name for a40
SQL> col region for a40
SQL> col city for a40
SQL> col filename for a40
SQL> set lines 200
SQL> select * from EXTERNAL_DATA

TYPE       NAME                                     REGION                                   CITY                                     FILENAME
---------- ---------------------------------------- ---------------------------------------- ---------------------------------------- ----------------------------------------
Country    India                                    Karnataka                                Bengaluru                                data.txt
Country    Spain                                    Catalunya                                Barcelona                                data1.txt
Country    Spain                                    Catalunya                                Barcelona                                data2.txt

Now we are going to merge the content of this external table into our definitive table. Create final table

SQL> CREATE TABLE FINAL_DATA
 (
 TYPE              VARCHAR2(10),
 NAME              VARCHAR2(20),
 REGION            VARCHAR2(20),
 CITY              VARCHAR2(20),
 FILENAME          VARCHAR2(20),
 DATA              BLOB
 )  2    3    4    5    6    7    8    9  ;

Table created.

insert/merge the data from the external table into the final table

SQL> merge into FINAL_DATA target
using ( select * from EXTERNAL_DATA ) source
on (  target.type = source.type and
      target.name = source.name and
      target.region = source.region and
      target.city = source.city
    )
  2    3    4    5    6    7    8  when not matched then
  9  insert ( type, name, region, city, filename )
values
( source.type , source.name , source.region, source.city , source.filename ); 10   11

3 rows merged.

SQL> commit ;

Commit complete.

Now we need the procedure to convert the blob object ( file ) into a blob object and store it into the table

CREATE OR REPLACE FUNCTION os_file_to_blob(p_dir in varchar2 , p_file_name VARCHAR2) RETURN BLOB AS
dest_loc BLOB := empty_blob();
src_loc BFILE := BFILENAME(p_dir, p_file_name);
BEGIN
DBMS_LOB.OPEN(src_loc, DBMS_LOB.LOB_READONLY);
DBMS_LOB.CREATETEMPORARY(
lob_loc => dest_loc
, cache => true
, dur => dbms_lob.session
);
DBMS_LOB.OPEN(dest_loc, DBMS_LOB.LOB_READWRITE);
DBMS_LOB.LOADFROMFILE(
dest_lob => dest_loc
, src_lob => src_loc
, amount => DBMS_LOB.getLength(src_loc));
DBMS_LOB.CLOSE(dest_loc);
DBMS_LOB.CLOSE(src_loc);
RETURN dest_loc;
END os_file_to_blob;
/

And a function to covert the blob into a clob

create or replace function fil_rep.blob_to_char (b blob)
return clob is
  v_clob    clob;
  n         number;
  v_start   pls_integer := 1;
  v_buffer  pls_integer := 32767;
  v_varchar varchar2(32767);
begin
  if (b is null) 
  then
    return null;
  end if;
  if (dbms_lob.getlength(b)=0) 
  then
    return empty_clob();
  end if;
  dbms_lob.createtemporary(v_clob,true);
  for i in 1..ceil(dbms_lob.getlength(b) / v_buffer)
  loop
    v_varchar := utl_raw.cast_to_varchar2(dbms_lob.substr(b, v_buffer, v_start));
    dbms_lob.writeappend(v_clob, length(v_varchar), v_varchar);
    v_start := v_start + v_buffer;
  end loop;
RETURN v_clob;
end blob_to_char;
/

Create the functions for blob treatment

SQL> CREATE OR REPLACE FUNCTION os_file_to_blob(p_dir in varchar2 , p_file_name VARCHAR2) RETURN BLOB AS
  2  dest_loc BLOB := empty_blob();
  3  src_loc BFILE := BFILENAME(p_dir, p_file_name);
  4  BEGIN
  5  DBMS_LOB.OPEN(src_loc, DBMS_LOB.LOB_READONLY);
  6  DBMS_LOB.CREATETEMPORARY(
  7  lob_loc => dest_loc
  8  , cache => true
  9  , dur => dbms_lob.session
 10  );
 11  DBMS_LOB.OPEN(dest_loc, DBMS_LOB.LOB_READWRITE);
 12  DBMS_LOB.LOADFROMFILE(
 13  dest_lob => dest_loc
 14  , src_lob => src_loc
 15  , amount => DBMS_LOB.getLength(src_loc));
 16  DBMS_LOB.CLOSE(dest_loc);
 17  DBMS_LOB.CLOSE(src_loc);
 18  RETURN dest_loc;
 19* END os_file_to_blob;
 /

 Function created.

SQL> create or replace function blob_to_char (b blob)
  2  return clob is
  3    v_clob    clob;
  4    n         number;
  5    v_start   pls_integer := 1;
  6    v_buffer  pls_integer := 32767;
  7    v_varchar varchar2(32767);
  8  begin
  9    if (b is null)
 10    then
 11      return null;
 12    end if;
 13    if (dbms_lob.getlength(b)=0)
 14    then
 15      return empty_clob();
 16    end if;
 17    dbms_lob.createtemporary(v_clob,true);
 18    for i in 1..ceil(dbms_lob.getlength(b) / v_buffer)
 19    loop
 20     v_varchar := utl_raw.cast_to_varchar2(dbms_lob.substr(b, v_buffer, v_start));
 21     dbms_lob.writeappend(v_clob, length(v_varchar), v_varchar);
 22     v_start := v_start + v_buffer;
 23    end loop;
 24  RETURN v_clob;
 25* end blob_to_char;
 /

Function created.

Now, we need to create a directory for each location we have

Directories to be created

create or replace directory dir_01 as '/test/India/Karnataka/Bengaluru' ;
create or replace directory dir_02 as '/test/Spain/Catalunya/Barcelona' ;

Update the blobs for each directory

SQL> DECLARE
v_blob BLOB;
v_file varchar2(40);
BEGIN
for h in 
      ( select filename from final_data where data is null and name = 'India' )
loop 
    v_file := h.filename;
    begin
        v_blob := os_file_to_blob ('DIR_01' , v_file);
        update final_data set data = v_blob where filename = v_file;
        commit;
        exception when others then null;
    end;
end loop;
end;
/

PL/SQL procedure successfully completed.

SQL> DECLARE
v_blob BLOB;
v_file varchar2(40);
BEGIN
for h in 
      ( select filename from final_data where data is null and name = 'Spain' )
loop 
    v_file := h.filename;
    begin
        v_blob := os_file_to_blob ('DIR_02' , v_file);
        update final_data set data = v_blob where filename = v_file;
        commit;
        exception when others then null;
    end;
end loop;
end;
/

PL/SQL procedure successfully completed.

Final select

SQL> select a.filename , blob_to_char(a.data) as content from final_data a ;

FILENAME                                 CONTENT
---------------------------------------- --------------------------------------------------------------------------------
data.txt                                 This is an example of text for this city
data1.txt                                Another example for Catalonia and Barcelona
data2.txt                                Second example for Catalonia and Barcelona

SQL>

Obviously, if I would to do it, I would put a field in the table called directory_path, therefore I could make the creation of the directories totally automatic.

I did not publish everything I had in mind, just enough for you to get an idea. In my case

  • All the process is included in a shell script
  • The process is triggered by dbms_scheduler every single day
  • In my case, I updating some logfiles for different directories to a final table for some users can check them. The reason is that they have no access to the Filesystem in the server.
  • I use merge statement to insert/update in the final table, therefore I only upload the content of new or modified files. For that I retrieve in the csv file the file's timestamp.

Please, share your doubts and suggestions.

Upvotes: 1

Related Questions