Reputation: 1923
How to create external table from nested directory (directory name will be the column value ) one directory may have multiple sub directory one simple example given below
Country→India→Karnataka→Bengaluru→….data.txt
Country→India→Maharashtra→Mumbai→....data1.txt
Expected Table
Country | India | Karnataka | Bengaluru |…content of data.txt
Country | India | Maharashtra| Mumbai|…content of data1.txt
Upvotes: 2
Views: 657
Reputation: 8518
This is an interesting issue I came across in one of my projects. So, I am going to show basically what I did and all components you need.
Components
Scenario
Let's imagine I have the following structure
ls -R | grep ":$" | sed -e 's/:$//' -e 's/[^-][^\/]*\//--/g' -e 's/^/ /' -e 's/-/|/'
.
|-India
|---Karnataka
|-----Bengaluru
|-Spain
|---Catalunya
|-----Barcelona
$ ls -R
.:
India Spain
./India:
Karnataka
./India/Karnataka:
Bengaluru
./India/Karnataka/Bengaluru:
data.txt
./Spain:
Catalunya
./Spain/Catalunya:
Barcelona
./Spain/Catalunya/Barcelona:
data1.txt data2.txt
Following your logic I need an external table as follows
Country|India|Karnataka|Bengaluru|Filename|content of data.txt
Country|Spain|Catalonia|Barcelona|Filename|content of data1.txt
Country|Spain|Catalonia|Barcelona|Filename|content of data2.txt
Contents of the files
pwd
/test/India/Karnataka/Bengaluru
$ cat data.txt
This is an example of text for this city
$ cd ../../../Spain/Catalunya/Barcelona/
$ cat data1.txt
Another example for Catalonia and Barcelona
$ cat data2.txt
Second example for Catalonia and Barcelona
So, our external table would be
CREATE TABLE EXTERNAL_DATA (
TYPE VARCHAR2(4000),
NAME VARCHAR2(4000),
REGION VARCHAR2(4000),
CITY VARCHAR2(4000),
FILENAME VARCHAR2(4000)
)
ORGANIZATION EXTERNAL (
TYPE ORACLE_LOADER
DEFAULT DIRECTORY dir_util_db
ACCESS PARAMETERS (
RECORDS DELIMITED BY newline
FIELD NAMES ALL FILES IGNORE
BADFILE dir_util_db:'ext_tab_inp.bad'
DISCARDFILE dir_util_db:'ext_tab_inp.dsc'
LOGFILE dir_util_db:'ext_tab_inp.log'
FIELDS TERMINATED BY '|'
NOTRIM
MISSING FIELD VALUES ARE NULL
REJECT ROWS WITH ALL NULL
FIELDS
(
TYPE CHAR(4000),
NAME CHAR(4000),
REGION CHAR(4000),
CITY CHAR(4000),
FILENAME CHAR(4000)
)
)
LOCATION ('ext_data_test.csv')
)
REJECT LIMIT UNLIMITED;
Our definitive table will include the blob column.
CREATE TABLE FINAL_DATA (
TYPE VARCHAR2(10),
NAME VARCHAR2(20),
REGION VARCHAR2(20),
CITY VARCHAR2(20),
FILENAME VARCHAR2(20),
DATA BLOB
)
We will need some functions to retrieve the blob file from the filesystem and store its content as blob data. We will also need a function to convert the blob content as clob text.
Step by step
Number of records for the external table
$ find . -name "*.txt" | wc -l
3
Fill up the csv file
$ echo "TYPE|NAME|REGION|CITY|FILENAME" > ext_tab_test.csv
$ for i in $(find . -name "*.txt" -type f -print)
> do
> var=$(echo $i)
> record_type="Country"
> record_name=$(echo $var | awk -F '/' '{print $2}')
> record_region=$(echo $var | awk -F '/' '{print $3}')
> record_city=$(echo $var | awk -F '/' '{print $4}')
> record_file=$(echo $var | awk -F '/' '{print $NF}')
> echo "$record_type|$record_name|$record_region|$record_city|$record_file" >> ext_tab_test.csv
> done
$ cat ext_tab_test.csv
TYPE|NAME|REGION|CITY|FILENAME
Country|India|Karnataka|Bengaluru|data.txt
Country|Spain|Catalunya|Barcelona|data1.txt
Country|Spain|Catalunya|Barcelona|data2.txt
Let's go to the database
Create the directory where the external table is located
SQL> create or replace directory dir_test as '/test' ;
Directory created.
Create the external table
SQL> CREATE TABLE EXTERNAL_DATA (
TYPE VARCHAR2(4000),
NAME VARCHAR2(4000),
REGION VARCHAR2(4000),
CITY VARCHAR2(4000),
FILENAME VARCHAR2(4000)
)
ORGANIZATION EXTERNAL (
TYPE ORACLE_LOADER
DEFAULT DIRECTORY dir_test
ACCESS PARAMETERS (
RECORDS DELIMITED BY newline
FIELD NAMES ALL FILES IGNORE
BADFILE dir_test:'ext_tab_test.bad'
DISCARDFILE dir_test:'ext_tab_test.dsc'
LOGFILE dir_test:'ext_tab_test.log'
FIELDS TERMINATED BY '|'
NOTRIM
MISSING FIELD VALUES ARE NULL
REJECT ROWS WITH ALL NULL
FIELDS
(
TYPE CHAR(4000),
NAME CHAR(4000),
REGION CHAR(4000),
CITY CHAR(4000),
FILENAME CHAR(4000)
)
)
LOCATION ('ext_tab_test.csv')
)
REJECT LIMIT UNLIMITED;
Table created.
SQL> col type for a40
SQL> col name for a40
SQL> col region for a40
SQL> col city for a40
SQL> col filename for a40
SQL> set lines 200
SQL> select * from EXTERNAL_DATA
TYPE NAME REGION CITY FILENAME
---------- ---------------------------------------- ---------------------------------------- ---------------------------------------- ----------------------------------------
Country India Karnataka Bengaluru data.txt
Country Spain Catalunya Barcelona data1.txt
Country Spain Catalunya Barcelona data2.txt
Now we are going to merge the content of this external table into our definitive table. Create final table
SQL> CREATE TABLE FINAL_DATA
(
TYPE VARCHAR2(10),
NAME VARCHAR2(20),
REGION VARCHAR2(20),
CITY VARCHAR2(20),
FILENAME VARCHAR2(20),
DATA BLOB
) 2 3 4 5 6 7 8 9 ;
Table created.
insert/merge the data from the external table into the final table
SQL> merge into FINAL_DATA target
using ( select * from EXTERNAL_DATA ) source
on ( target.type = source.type and
target.name = source.name and
target.region = source.region and
target.city = source.city
)
2 3 4 5 6 7 8 when not matched then
9 insert ( type, name, region, city, filename )
values
( source.type , source.name , source.region, source.city , source.filename ); 10 11
3 rows merged.
SQL> commit ;
Commit complete.
Now we need the procedure to convert the blob object ( file ) into a blob object and store it into the table
CREATE OR REPLACE FUNCTION os_file_to_blob(p_dir in varchar2 , p_file_name VARCHAR2) RETURN BLOB AS
dest_loc BLOB := empty_blob();
src_loc BFILE := BFILENAME(p_dir, p_file_name);
BEGIN
DBMS_LOB.OPEN(src_loc, DBMS_LOB.LOB_READONLY);
DBMS_LOB.CREATETEMPORARY(
lob_loc => dest_loc
, cache => true
, dur => dbms_lob.session
);
DBMS_LOB.OPEN(dest_loc, DBMS_LOB.LOB_READWRITE);
DBMS_LOB.LOADFROMFILE(
dest_lob => dest_loc
, src_lob => src_loc
, amount => DBMS_LOB.getLength(src_loc));
DBMS_LOB.CLOSE(dest_loc);
DBMS_LOB.CLOSE(src_loc);
RETURN dest_loc;
END os_file_to_blob;
/
And a function to covert the blob into a clob
create or replace function fil_rep.blob_to_char (b blob)
return clob is
v_clob clob;
n number;
v_start pls_integer := 1;
v_buffer pls_integer := 32767;
v_varchar varchar2(32767);
begin
if (b is null)
then
return null;
end if;
if (dbms_lob.getlength(b)=0)
then
return empty_clob();
end if;
dbms_lob.createtemporary(v_clob,true);
for i in 1..ceil(dbms_lob.getlength(b) / v_buffer)
loop
v_varchar := utl_raw.cast_to_varchar2(dbms_lob.substr(b, v_buffer, v_start));
dbms_lob.writeappend(v_clob, length(v_varchar), v_varchar);
v_start := v_start + v_buffer;
end loop;
RETURN v_clob;
end blob_to_char;
/
Create the functions for blob treatment
SQL> CREATE OR REPLACE FUNCTION os_file_to_blob(p_dir in varchar2 , p_file_name VARCHAR2) RETURN BLOB AS
2 dest_loc BLOB := empty_blob();
3 src_loc BFILE := BFILENAME(p_dir, p_file_name);
4 BEGIN
5 DBMS_LOB.OPEN(src_loc, DBMS_LOB.LOB_READONLY);
6 DBMS_LOB.CREATETEMPORARY(
7 lob_loc => dest_loc
8 , cache => true
9 , dur => dbms_lob.session
10 );
11 DBMS_LOB.OPEN(dest_loc, DBMS_LOB.LOB_READWRITE);
12 DBMS_LOB.LOADFROMFILE(
13 dest_lob => dest_loc
14 , src_lob => src_loc
15 , amount => DBMS_LOB.getLength(src_loc));
16 DBMS_LOB.CLOSE(dest_loc);
17 DBMS_LOB.CLOSE(src_loc);
18 RETURN dest_loc;
19* END os_file_to_blob;
/
Function created.
SQL> create or replace function blob_to_char (b blob)
2 return clob is
3 v_clob clob;
4 n number;
5 v_start pls_integer := 1;
6 v_buffer pls_integer := 32767;
7 v_varchar varchar2(32767);
8 begin
9 if (b is null)
10 then
11 return null;
12 end if;
13 if (dbms_lob.getlength(b)=0)
14 then
15 return empty_clob();
16 end if;
17 dbms_lob.createtemporary(v_clob,true);
18 for i in 1..ceil(dbms_lob.getlength(b) / v_buffer)
19 loop
20 v_varchar := utl_raw.cast_to_varchar2(dbms_lob.substr(b, v_buffer, v_start));
21 dbms_lob.writeappend(v_clob, length(v_varchar), v_varchar);
22 v_start := v_start + v_buffer;
23 end loop;
24 RETURN v_clob;
25* end blob_to_char;
/
Function created.
Now, we need to create a directory for each location we have
Directories to be created
create or replace directory dir_01 as '/test/India/Karnataka/Bengaluru' ;
create or replace directory dir_02 as '/test/Spain/Catalunya/Barcelona' ;
Update the blobs for each directory
SQL> DECLARE
v_blob BLOB;
v_file varchar2(40);
BEGIN
for h in
( select filename from final_data where data is null and name = 'India' )
loop
v_file := h.filename;
begin
v_blob := os_file_to_blob ('DIR_01' , v_file);
update final_data set data = v_blob where filename = v_file;
commit;
exception when others then null;
end;
end loop;
end;
/
PL/SQL procedure successfully completed.
SQL> DECLARE
v_blob BLOB;
v_file varchar2(40);
BEGIN
for h in
( select filename from final_data where data is null and name = 'Spain' )
loop
v_file := h.filename;
begin
v_blob := os_file_to_blob ('DIR_02' , v_file);
update final_data set data = v_blob where filename = v_file;
commit;
exception when others then null;
end;
end loop;
end;
/
PL/SQL procedure successfully completed.
Final select
SQL> select a.filename , blob_to_char(a.data) as content from final_data a ;
FILENAME CONTENT
---------------------------------------- --------------------------------------------------------------------------------
data.txt This is an example of text for this city
data1.txt Another example for Catalonia and Barcelona
data2.txt Second example for Catalonia and Barcelona
SQL>
Obviously, if I would to do it, I would put a field in the table called directory_path, therefore I could make the creation of the directories totally automatic.
I did not publish everything I had in mind, just enough for you to get an idea. In my case
dbms_scheduler
every single daymerge
statement to insert/update in the final table, therefore I only upload the content of new or modified files. For that I retrieve in the csv file the file's timestamp.Please, share your doubts and suggestions.
Upvotes: 1