Burhan Khalid Butt
Burhan Khalid Butt

Reputation: 275

Eliminating carriage return from an input line while uploading CSV in oracle

Is there any way to remove carriage return from a row that is being uploaded from a csv.

A cell in my sample csv is as follows:

"Administrator,, admin (admin): Mon Jan 25 09:48:22 GMT 2016
please fix this issue ASAP
---
Zaeem,, Muhammad (mzaeem): Mon Jan 25 14:22:05 PKT 2016
What you said has been added
---
."

When I upload this csv, the lines of this cell makes new records. I don't want this to happen. I am using UTL_FILE to upload the data and I have developed a stored procedure for this.

Stored procedure is:

CREATE OR REPLACE PROCEDURE Import_DATA_File (P_FILENAME   IN     VARCHAR2,
                                                P_QUERY_REF IN VARCHAR2,
                                              o_msg           OUT VARCHAR2)
IS
   Infile                UTL_FILE.File_Type;
   Linebuf               VARCHAR2 (4000);
   V_Getstring           VARCHAR2 (100);

   -- Field Values Array
   TYPE Fieldvalue IS TABLE OF VARCHAR2 (100)
                         INDEX BY BINARY_INTEGER;

   Field_Position        Fieldvalue;

   Total_Rec_Count       NUMBER := 0;
   Total_Rec_Processed   NUMBER := 0;
BEGIN
   Infile := UTL_FILE.Fopen ('FILE_UPLOAD_DIR', P_FILENAME, 'R');

   LOOP
      ---
      UTL_FILE.Get_Line (Infile,Linebuf);

      Linebuf := Linebuf || '","';

      FOR I IN 1 .. 51
      LOOP
         Field_Position (I) :=
            Plsql_Delimited_String.Getstring (Linebuf,
                                              I,
                                              FALSE,
                                              '","');
      END LOOP;

      BEGIN
         Total_Rec_Count := Total_Rec_Count + 1;

         -- table insertion 
         INSERT INTO DATA_Q127_CT (query_ref,
                                   col_v1,
                                   col_v2,
                                   col_v3,
                                   col_v4,
                                   col_v5,
                                   col_v6,
                                   col_v7,
                                   col_v8,
                                   col_v9,
                                   col_v10,
                                   col_v11,
                                   col_v12,
                                   col_v13,
                                   col_v14,
                                   col_v15,
                                   col_v16,
                                   col_v17,
                                   col_v18,
                                   col_v19,
                                   col_v20,
                                   col_v21,
                                   col_v22,
                                   col_v23,
                                   col_v24,
                                   col_v25,
                                   col_v26,
                                   col_v27,
                                   col_v28,
                                   col_v29,
                                   col_v30,
                                   col_v31,
                                   col_v32,
                                   col_v33,
                                   col_v34,
                                   col_v35,
                                   col_v36,
                                   col_v37,
                                   col_v38,
                                   col_v39,
                                   col_v40,
                                   col_v41,
                                   col_v42,
                                   col_v43,
                                   col_v44,
                                   col_v45,
                                   col_v46,
                                   col_v47,
                                   col_v48,
                                   col_v49,
                                   col_v50)
              VALUES (
              P_QUERY_REF,
              REPLACE (field_position (1),'"',''),
                      REPLACE (field_position (2),'"',''),
                      REPLACE (field_position (3),'"',''),
                      REPLACE (field_position (4),'"',''),
                      REPLACE (field_position (5),'"',''),
                      REPLACE (field_position (6),'"',''),
                      REPLACE (field_position (7),'"',''),
                      REPLACE (field_position (8),'"',''),
                      REPLACE (field_position (9),'"',''),
                      REPLACE (field_position (10),'"',''),
                      REPLACE (field_position (11),'"',''),
                      REPLACE (field_position (12),'"',''),
                      REPLACE (field_position (13),'"',''),
                      REPLACE (field_position (14),'"',''),
                      REPLACE (field_position (15),'"',''),
                      REPLACE (field_position (16),'"',''),
                      REPLACE (field_position (17),'"',''),
                      REPLACE (field_position (18),'"',''),
                      REPLACE (field_position (19),'"',''),
                      REPLACE (field_position (20),'"',''),
                      REPLACE (field_position (21),'"',''),
                      REPLACE (field_position (22),'"',''),
                      REPLACE (field_position (23),'"',''),
                      REPLACE (field_position (24),'"',''),
                      REPLACE (field_position (25),'"',''),
                      REPLACE (field_position (26),'"',''),
                      REPLACE (field_position (27),'"',''),
                      REPLACE (field_position (28),'"',''),
                      REPLACE (field_position (29),'"',''),
                      REPLACE (field_position (30),'"',''),
                      REPLACE (field_position (31),'"',''),
                      REPLACE (field_position (32),'"',''),
                      REPLACE (field_position (33),'"',''),
                      REPLACE (field_position (34),'"',''),
                      REPLACE (field_position (35),'"',''),
                      REPLACE (field_position (36),'"',''),
                      REPLACE (field_position (37),'"',''),
                      REPLACE (field_position (38),'"',''),
                      REPLACE (field_position (39),'"',''),
                      REPLACE (field_position (40),'"',''),
                      REPLACE (field_position (41),'"',''),
                      REPLACE (field_position (42),'"',''),
                      REPLACE (field_position (43),'"',''),
                      REPLACE (field_position (44),'"',''),
                      REPLACE (field_position (45),'"',''),
                      REPLACE (field_position (46),'"',''),
                      REPLACE (field_position (47),'"',''),
                      REPLACE (field_position (48),'"',''),
                      REPLACE (field_position (49),'"',''),
                      REPLACE (field_position (50),'"','')
                      );


         Total_Rec_Processed := Total_Rec_Processed + 1;
      EXCEPTION
         WHEN OTHERS
         THEN
            -- ignoring error during database insertion
            NULL;
      END;
   END LOOP;

   IF UTL_FILE.is_open (infile)
   THEN
      UTL_FILE.Fclose (Infile);
   END IF;
EXCEPTION
   WHEN NO_DATA_FOUND
   THEN
      IF UTL_FILE.is_open (infile)
      THEN
         UTL_FILE.Fclose (Infile);
      END IF;

      IF total_rec_processed > 0
      THEN
         COMMIT;

         DELETE FROM DATA_Q127_CT
               WHERE QUERY_REF = 'QUERY_REF'
               OR COL_V50 = 'COL_V50';
               COMMIT;
      END IF;
   WHEN OTHERS
   THEN
      IF UTL_FILE.is_open (infile)
      THEN
         UTL_FILE.Fclose (Infile);
      END IF;

      o_msg := SQLERRM;
END;

DDL of the function Plsql_Delimited_String.Getstring is:

   FUNCTION GetString (Source_string    IN VARCHAR2,
                       Field_position   IN NUMBER,
                       UnTerminated     IN BOOLEAN DEFAULT FALSE,
                       Delimiter        IN VARCHAR2 DEFAULT ',')
      RETURN VARCHAR2
   IS
      iPtrEnd           PLS_INTEGER := 0;
      iPtrStart         PLS_INTEGER := 0;
      vcSourceStrCopy   VARCHAR2 (32000) := Source_string;
   BEGIN
      IF UnTerminated
      THEN
         vcSourceStrCopy := vcSourceStrCopy || Delimiter;
      END IF;

      IF Field_Position > 1
      THEN
         iPtrStart :=
            INSTR (vcSourceStrCopy,
                   Delimiter,
                   1,
                   Field_Position - 1)
            + LENGTH (Delimiter);
      ELSE
         iPtrStart := 1;
      END IF;

      iPtrEnd :=
         INSTR (vcSourceStrCopy,
                Delimiter,
                1,
                Field_Position);
      RETURN SUBSTR (vcSourceStrCopy, iPtrStart, (iPtrEnd - iPtrStart));
   END GetString;

I want to know, is there any way by which I can eliminate carriage returns from the incoming data and make the value (as mentioned above) as a single row.

As per my analysis when the line is read during UTL_FILE.Get_Line (Infile,Linebuf) function call. This function somehow only reads the first line of the string. In other words, only this line "Administrator,, admin (admin): Mon Jan 25 09:48:22 GMT 2016 is read and the rest of the string is simply ignored and is considered as a new line.

I want to know how should I write such a loop that will keep on reading the line until it finds the whole string.

Please help!

Upvotes: 1

Views: 2905

Answers (3)

Adam Silenko
Adam Silenko

Reputation: 3108

If you seek delimiter of quoted field you should use

'"' || Delimiter || '"' 

as substring for instr function.
You can use replace to remove carriage return char like this, and you can remove double quotes in GetString functions using Trim

RETURN Trim( BOTH '"' FROM  REPLACE(SUBSTR (vcSourceStrCopy, iPtrStart, (iPtrEnd - iPtrStart)), '
', ' ')); -- here is next line, after '

You can also use CHR like this: replace (str, chr(10), ' '). or if you have line feed & carriage returns, use chr(10) || chr(13) or chr(13) || chr(10).
You can test what char are used to brake line using hex editor or echo char using

dbms_output.put_line(asc(unknown_char));

you can remove all line break char using translate:

RETURN Trim( BOTH '"' FROM  TRANSLATE(SUBSTR (vcSourceStrCopy, iPtrStart, (iPtrEnd - iPtrStart)), chr(13) || chr(10) || chr(9), '   '));

EDIT: I'm understood your problem after answer Gary_W
To be sure you reed all field in row, you should test last readded char, if it is double quote then it is end of row, if not read next line and add to buffer, so try read row like this:

Linebuf := '';
LOOP
  UTL_FILE.Get_Line (Infile,part_Linebuf);
  Linebuf := Linebuf || part_Linebuf;
  EXIT WHEN SUBSTR(part_Linebuf, -1, 1) = '"'; -- if end of row is: "
  --EXIT WHEN SUBSTR(part_Linebuf, -2, 2) = '",'; -- if end of row is: ",
END LOOP;   

Upvotes: 1

Burhan Khalid Butt
Burhan Khalid Butt

Reputation: 275

Thanks for all the folks out there who took time and helped me. Well here the solution. It's a bit slow but it does the job.

CREATE OR REPLACE PROCEDURE ARENA.Import_DATA_File5 (P_FILENAME   IN     VARCHAR2,
                                                P_QUERY_REF IN VARCHAR2,
                                              o_msg           OUT VARCHAR2)
IS
   Infile                UTL_FILE.File_Type;
   Linebuf               VARCHAR2 (32000);
   V_Getstring           VARCHAR2 (32000);
   v_sql VARCHAR2 (32000);
--   v_count NUMBER;

   -- Field Values Array
   TYPE Fieldvalue IS TABLE OF VARCHAR2 (4000)
                         INDEX BY BINARY_INTEGER;

   Field_Position        Fieldvalue;

   Total_Rec_Count       NUMBER := 0;
   Total_Rec_Processed   NUMBER := 0;

   V_RECORD_COMP         NUMBER := 0;
   V_TEMP_STR            VARCHAR2(32000);
   V_TEMP                NUMBER := 0;
   V_MULTIREC_START      NUMBER := 0;
   V_REC_POSTED          NUMBER := 1;

BEGIN
   Infile := UTL_FILE.Fopen ('FILE_UPLOAD_DIR', P_FILENAME, 'R');

   LOOP
      ---

      IF V_REC_POSTED = 1 THEN      
         UTL_FILE.Get_Line (Infile,Linebuf);
         V_REC_POSTED := 0;         
      END IF;                       

      SELECT INSTR(Linebuf, '","', 1, 7)
      INTO   V_TEMP
      FROM   DUAL;

      IF V_TEMP <> 0 OR V_RECORD_COMP <> 0 THEN --Single-line record or multi-line record picking complete
         V_TEMP_STR    := '';
         V_RECORD_COMP := 0;
         V_REC_POSTED  := 1;


      Linebuf := Linebuf || '","';
--   INSERT INTO DATA_FLU_TEST (COMMENTS)
--        VALUES ('---Linebuf
--        ' || Linebuf);

      FOR I IN 1 .. 51
      LOOP
         Field_Position (I) :=
            Plsql_Delimited_String.Getstring (Linebuf,--REPLACE(REPLACE(Linebuf, CHR(13)), CHR(10)),
                                              I,
                                              FALSE,
                                              '","');
      END LOOP;

      BEGIN
         Total_Rec_Count := Total_Rec_Count + 1;

         -- table insertion 
         INSERT INTO DATA_Q127_CT (query_ref,
                                   col_v1,
                                   col_v2,
                                   col_v3,
                                   col_v4,
                                   col_v5,
                                   col_v6,
                                   col_v7,
                                   col_v8,
                                   col_v9,
                                   col_v10,
                                   col_v11,
                                   col_v12,
                                   col_v13,
                                   col_v14,
                                   col_v15,
                                   col_v16,
                                   col_v17,
                                   col_v18,
                                   col_v19,
                                   col_v20,
                                   col_v21,
                                   col_v22,
                                   col_v23,
                                   col_v24,
                                   col_v25,
                                   col_v26,
                                   col_v27,
                                   col_v28,
                                   col_v29,
                                   col_v30,
                                   col_v31,
                                   col_v32,
                                   col_v33,
                                   col_v34,
                                   col_v35,
                                   col_v36,
                                   col_v37,
                                   col_v38,
                                   col_v39,
                                   col_v40,
                                   col_v41,
                                   col_v42,
                                   col_v43,
                                   col_v44,
                                   col_v45,
                                   col_v46,
                                   col_v47,
                                   col_v48,
                                   col_v49,
                                   col_v50)
              VALUES (
              P_QUERY_REF,
              field_position (1),
                      field_position (2),
                      field_position (3),
--                      REPLACE (field_position (3),CHR(10),' '),
--                      REPLACE(field_position (3), CHR(13)||CHR(10), ' '),
--                      TRIM(TRANSLATE(field_position (3), CHR(10), '\\n')),
--                      regexp_replace(field_position (3),'[[:cntrl:]]',''),
--Translate(field_position (3), CHR(13),'') ,
                      field_position (4),
                      field_position (5),
                      field_position (6),
                      field_position (7),
                      field_position (8),
                      field_position (9),
                      field_position (10),
                      field_position (11),
                      field_position (12),
                      field_position (13),
                      field_position (14),
                      field_position (15),
                      field_position (16),
                      field_position (17),
                      field_position (18),
                      field_position (19),
                      field_position (20),
                      field_position (21),
                      field_position (22),
                      field_position (23),
                      field_position (24),
                      field_position (25),
                      field_position (26),
                      field_position (27),
                      field_position (28),
                      field_position (29),
                      field_position (30),
                      field_position (31),
                      field_position (32),
                      field_position (33),
                      field_position (34),
                      field_position (35),
                      field_position (36),
                      field_position (37),
                      field_position (38),
                      field_position (39),
                      field_position (40),
                      field_position (41),
                      field_position (42),
                      field_position (43),
                      field_position (44),
                      field_position (45),
                      field_position (46),
                      field_position (47),
                      field_position (48),
                      field_position (49),
                      field_position (50)
                      );

--   INSERT INTO DATA_FLU_TEST (COMMENTS)
--        VALUES ('---Import_DATA_File
--        ' || field_position (3));

   COMMIT;

         Total_Rec_Processed := Total_Rec_Processed + 1;
      EXCEPTION
         WHEN OTHERS
         THEN

            NULL;
      END;


   ELSE
     V_TEMP_STR       := V_TEMP_STR || Linebuf;
     V_REC_POSTED     := 1;
     IF INSTR(Linebuf, '","', 1, 1) <> 0 AND V_MULTIREC_START = 1 THEN      --End of multi-line record
        SELECT REPLACE(REPLACE(V_TEMP_STR, CHR(10) , ''), CHR(13), '')
        INTO   Linebuf
        FROM   DUAL;
        V_RECORD_COMP := 1;
        V_REC_POSTED  := 0;
        V_MULTIREC_START     := 0;
     END IF;
     V_MULTIREC_START := 1;
   END IF;


   END LOOP;

   IF UTL_FILE.is_open (infile)
   THEN
      UTL_FILE.Fclose (Infile);
   END IF;




EXCEPTION
   WHEN NO_DATA_FOUND
   THEN
      IF UTL_FILE.is_open (infile)
      THEN
         UTL_FILE.Fclose (Infile);
      END IF;

      IF total_rec_processed > 0
      THEN
         COMMIT;

         DELETE FROM DATA_Q127_CT
               WHERE QUERY_REF = 'QUERY_REF'
               OR COL_V50 = 'COL_V50';
               COMMIT;
      END IF;
   WHEN OTHERS
   THEN
      IF UTL_FILE.is_open (infile)
      THEN
         UTL_FILE.Fclose (Infile);
      END IF;
v_sql := 'Error in '|| $$plsql_unit || ' at ' || $$plsql_line;
      o_msg := SQLERRM || '
      '|| v_sql;
--      DBMS_OUTPUT.put_line(v_sql);
END;

Upvotes: 0

Gary_W
Gary_W

Reputation: 10360

If I understand the problem correctly, at the top of the loop where you call UTL_FILE.Get_Line (Infile,Linebuf); you are trying to read in a user-entered text string (logically one field to go to one column) but that text string contains carriage returns which UTL_FILE.Get_Line does not handle and which you need to strip out.

Can you get the source system to strip them out before you get the file or can you pre-process the file before reading it? Since you used the word "cell" I assume the data is coming from a spreadsheet. Can your data source or you search/replace there first? That would be preferable.

Otherwise you are going to have to add logic to detect you are at the start of a multi-line text field and handle that. Currently you read in a line and append to linebuf. One way you might handle it is to read the line into a tmplinebuf instead, than take a look at that. If it starts and ends with double-quotes (just assuming text fields are surrounded with quotes based on the only field you showed us) then it's complete so append THAT to the main linebuf and get the next line. However if it starts with a double quote but does not end with one you are in a multi-line field so keep reading into tmplinebuf until a line is encountered that does not start with a double-quote but ends with one. THEN the mutli-line field is contained with tmplinebuf so append tmplinebuf to the main linebuf and continue. Think of it like a subloop to read all lines of a mutli-line text field when one is encountered and build it in a separate buffer before adding it to your main linebuf. Something like that. Technically you could read the multi-line text right into the linebuf, you really don't need the tmplinebuf as long as you handle building the format correctly.

BIG CAVEAT the text can't contain double-quotes :-/ I highly recommend scrubbing either by the source or by pre-processing by you before reading.

EDIT: Another thought; could the source supply you with the length of the data in the cell (field)? If this was available to you, there is a third argument to GET_LINE() which is the length of data to read which supercedes GET_LINE() stopping at the end-of-line character. Read into a tmplinebuf, use REPLACE() to strip CHR(10)'s, append tmplinebuf to linebuf?

Upvotes: 3

Related Questions