Reputation: 275
Is there any way to remove carriage return from a row that is being uploaded from a csv.
A cell in my sample csv is as follows:
"Administrator,, admin (admin): Mon Jan 25 09:48:22 GMT 2016
please fix this issue ASAP
---
Zaeem,, Muhammad (mzaeem): Mon Jan 25 14:22:05 PKT 2016
What you said has been added
---
."
When I upload this csv, the lines of this cell makes new records. I don't want this to happen. I am using UTL_FILE to upload the data and I have developed a stored procedure for this.
Stored procedure is:
CREATE OR REPLACE PROCEDURE Import_DATA_File (P_FILENAME IN VARCHAR2,
P_QUERY_REF IN VARCHAR2,
o_msg OUT VARCHAR2)
IS
Infile UTL_FILE.File_Type;
Linebuf VARCHAR2 (4000);
V_Getstring VARCHAR2 (100);
-- Field Values Array
TYPE Fieldvalue IS TABLE OF VARCHAR2 (100)
INDEX BY BINARY_INTEGER;
Field_Position Fieldvalue;
Total_Rec_Count NUMBER := 0;
Total_Rec_Processed NUMBER := 0;
BEGIN
Infile := UTL_FILE.Fopen ('FILE_UPLOAD_DIR', P_FILENAME, 'R');
LOOP
---
UTL_FILE.Get_Line (Infile,Linebuf);
Linebuf := Linebuf || '","';
FOR I IN 1 .. 51
LOOP
Field_Position (I) :=
Plsql_Delimited_String.Getstring (Linebuf,
I,
FALSE,
'","');
END LOOP;
BEGIN
Total_Rec_Count := Total_Rec_Count + 1;
-- table insertion
INSERT INTO DATA_Q127_CT (query_ref,
col_v1,
col_v2,
col_v3,
col_v4,
col_v5,
col_v6,
col_v7,
col_v8,
col_v9,
col_v10,
col_v11,
col_v12,
col_v13,
col_v14,
col_v15,
col_v16,
col_v17,
col_v18,
col_v19,
col_v20,
col_v21,
col_v22,
col_v23,
col_v24,
col_v25,
col_v26,
col_v27,
col_v28,
col_v29,
col_v30,
col_v31,
col_v32,
col_v33,
col_v34,
col_v35,
col_v36,
col_v37,
col_v38,
col_v39,
col_v40,
col_v41,
col_v42,
col_v43,
col_v44,
col_v45,
col_v46,
col_v47,
col_v48,
col_v49,
col_v50)
VALUES (
P_QUERY_REF,
REPLACE (field_position (1),'"',''),
REPLACE (field_position (2),'"',''),
REPLACE (field_position (3),'"',''),
REPLACE (field_position (4),'"',''),
REPLACE (field_position (5),'"',''),
REPLACE (field_position (6),'"',''),
REPLACE (field_position (7),'"',''),
REPLACE (field_position (8),'"',''),
REPLACE (field_position (9),'"',''),
REPLACE (field_position (10),'"',''),
REPLACE (field_position (11),'"',''),
REPLACE (field_position (12),'"',''),
REPLACE (field_position (13),'"',''),
REPLACE (field_position (14),'"',''),
REPLACE (field_position (15),'"',''),
REPLACE (field_position (16),'"',''),
REPLACE (field_position (17),'"',''),
REPLACE (field_position (18),'"',''),
REPLACE (field_position (19),'"',''),
REPLACE (field_position (20),'"',''),
REPLACE (field_position (21),'"',''),
REPLACE (field_position (22),'"',''),
REPLACE (field_position (23),'"',''),
REPLACE (field_position (24),'"',''),
REPLACE (field_position (25),'"',''),
REPLACE (field_position (26),'"',''),
REPLACE (field_position (27),'"',''),
REPLACE (field_position (28),'"',''),
REPLACE (field_position (29),'"',''),
REPLACE (field_position (30),'"',''),
REPLACE (field_position (31),'"',''),
REPLACE (field_position (32),'"',''),
REPLACE (field_position (33),'"',''),
REPLACE (field_position (34),'"',''),
REPLACE (field_position (35),'"',''),
REPLACE (field_position (36),'"',''),
REPLACE (field_position (37),'"',''),
REPLACE (field_position (38),'"',''),
REPLACE (field_position (39),'"',''),
REPLACE (field_position (40),'"',''),
REPLACE (field_position (41),'"',''),
REPLACE (field_position (42),'"',''),
REPLACE (field_position (43),'"',''),
REPLACE (field_position (44),'"',''),
REPLACE (field_position (45),'"',''),
REPLACE (field_position (46),'"',''),
REPLACE (field_position (47),'"',''),
REPLACE (field_position (48),'"',''),
REPLACE (field_position (49),'"',''),
REPLACE (field_position (50),'"','')
);
Total_Rec_Processed := Total_Rec_Processed + 1;
EXCEPTION
WHEN OTHERS
THEN
-- ignoring error during database insertion
NULL;
END;
END LOOP;
IF UTL_FILE.is_open (infile)
THEN
UTL_FILE.Fclose (Infile);
END IF;
EXCEPTION
WHEN NO_DATA_FOUND
THEN
IF UTL_FILE.is_open (infile)
THEN
UTL_FILE.Fclose (Infile);
END IF;
IF total_rec_processed > 0
THEN
COMMIT;
DELETE FROM DATA_Q127_CT
WHERE QUERY_REF = 'QUERY_REF'
OR COL_V50 = 'COL_V50';
COMMIT;
END IF;
WHEN OTHERS
THEN
IF UTL_FILE.is_open (infile)
THEN
UTL_FILE.Fclose (Infile);
END IF;
o_msg := SQLERRM;
END;
DDL of the function Plsql_Delimited_String.Getstring is:
FUNCTION GetString (Source_string IN VARCHAR2,
Field_position IN NUMBER,
UnTerminated IN BOOLEAN DEFAULT FALSE,
Delimiter IN VARCHAR2 DEFAULT ',')
RETURN VARCHAR2
IS
iPtrEnd PLS_INTEGER := 0;
iPtrStart PLS_INTEGER := 0;
vcSourceStrCopy VARCHAR2 (32000) := Source_string;
BEGIN
IF UnTerminated
THEN
vcSourceStrCopy := vcSourceStrCopy || Delimiter;
END IF;
IF Field_Position > 1
THEN
iPtrStart :=
INSTR (vcSourceStrCopy,
Delimiter,
1,
Field_Position - 1)
+ LENGTH (Delimiter);
ELSE
iPtrStart := 1;
END IF;
iPtrEnd :=
INSTR (vcSourceStrCopy,
Delimiter,
1,
Field_Position);
RETURN SUBSTR (vcSourceStrCopy, iPtrStart, (iPtrEnd - iPtrStart));
END GetString;
I want to know, is there any way by which I can eliminate carriage returns from the incoming data and make the value (as mentioned above) as a single row.
As per my analysis when the line is read during UTL_FILE.Get_Line (Infile,Linebuf) function call. This function somehow only reads the first line of the string. In other words, only this line "Administrator,, admin (admin): Mon Jan 25 09:48:22 GMT 2016 is read and the rest of the string is simply ignored and is considered as a new line.
I want to know how should I write such a loop that will keep on reading the line until it finds the whole string.
Please help!
Upvotes: 1
Views: 2905
Reputation: 3108
If you seek delimiter of quoted field you should use
'"' || Delimiter || '"'
as substring for instr function.
You can use replace to remove carriage return char like this, and you can remove double quotes in GetString functions using Trim
RETURN Trim( BOTH '"' FROM REPLACE(SUBSTR (vcSourceStrCopy, iPtrStart, (iPtrEnd - iPtrStart)), '
', ' ')); -- here is next line, after '
You can also use CHR like this: replace (str, chr(10), ' '). or if you have line feed & carriage returns, use chr(10) || chr(13) or chr(13) || chr(10).
You can test what char are used to brake line using hex editor or echo char using
dbms_output.put_line(asc(unknown_char));
you can remove all line break char using translate:
RETURN Trim( BOTH '"' FROM TRANSLATE(SUBSTR (vcSourceStrCopy, iPtrStart, (iPtrEnd - iPtrStart)), chr(13) || chr(10) || chr(9), ' '));
EDIT:
I'm understood your problem after answer Gary_W
To be sure you reed all field in row, you should test last readded char, if it is double quote then it is end of row, if not read next line and add to buffer, so try read row like this:
Linebuf := '';
LOOP
UTL_FILE.Get_Line (Infile,part_Linebuf);
Linebuf := Linebuf || part_Linebuf;
EXIT WHEN SUBSTR(part_Linebuf, -1, 1) = '"'; -- if end of row is: "
--EXIT WHEN SUBSTR(part_Linebuf, -2, 2) = '",'; -- if end of row is: ",
END LOOP;
Upvotes: 1
Reputation: 275
Thanks for all the folks out there who took time and helped me. Well here the solution. It's a bit slow but it does the job.
CREATE OR REPLACE PROCEDURE ARENA.Import_DATA_File5 (P_FILENAME IN VARCHAR2,
P_QUERY_REF IN VARCHAR2,
o_msg OUT VARCHAR2)
IS
Infile UTL_FILE.File_Type;
Linebuf VARCHAR2 (32000);
V_Getstring VARCHAR2 (32000);
v_sql VARCHAR2 (32000);
-- v_count NUMBER;
-- Field Values Array
TYPE Fieldvalue IS TABLE OF VARCHAR2 (4000)
INDEX BY BINARY_INTEGER;
Field_Position Fieldvalue;
Total_Rec_Count NUMBER := 0;
Total_Rec_Processed NUMBER := 0;
V_RECORD_COMP NUMBER := 0;
V_TEMP_STR VARCHAR2(32000);
V_TEMP NUMBER := 0;
V_MULTIREC_START NUMBER := 0;
V_REC_POSTED NUMBER := 1;
BEGIN
Infile := UTL_FILE.Fopen ('FILE_UPLOAD_DIR', P_FILENAME, 'R');
LOOP
---
IF V_REC_POSTED = 1 THEN
UTL_FILE.Get_Line (Infile,Linebuf);
V_REC_POSTED := 0;
END IF;
SELECT INSTR(Linebuf, '","', 1, 7)
INTO V_TEMP
FROM DUAL;
IF V_TEMP <> 0 OR V_RECORD_COMP <> 0 THEN --Single-line record or multi-line record picking complete
V_TEMP_STR := '';
V_RECORD_COMP := 0;
V_REC_POSTED := 1;
Linebuf := Linebuf || '","';
-- INSERT INTO DATA_FLU_TEST (COMMENTS)
-- VALUES ('---Linebuf
-- ' || Linebuf);
FOR I IN 1 .. 51
LOOP
Field_Position (I) :=
Plsql_Delimited_String.Getstring (Linebuf,--REPLACE(REPLACE(Linebuf, CHR(13)), CHR(10)),
I,
FALSE,
'","');
END LOOP;
BEGIN
Total_Rec_Count := Total_Rec_Count + 1;
-- table insertion
INSERT INTO DATA_Q127_CT (query_ref,
col_v1,
col_v2,
col_v3,
col_v4,
col_v5,
col_v6,
col_v7,
col_v8,
col_v9,
col_v10,
col_v11,
col_v12,
col_v13,
col_v14,
col_v15,
col_v16,
col_v17,
col_v18,
col_v19,
col_v20,
col_v21,
col_v22,
col_v23,
col_v24,
col_v25,
col_v26,
col_v27,
col_v28,
col_v29,
col_v30,
col_v31,
col_v32,
col_v33,
col_v34,
col_v35,
col_v36,
col_v37,
col_v38,
col_v39,
col_v40,
col_v41,
col_v42,
col_v43,
col_v44,
col_v45,
col_v46,
col_v47,
col_v48,
col_v49,
col_v50)
VALUES (
P_QUERY_REF,
field_position (1),
field_position (2),
field_position (3),
-- REPLACE (field_position (3),CHR(10),' '),
-- REPLACE(field_position (3), CHR(13)||CHR(10), ' '),
-- TRIM(TRANSLATE(field_position (3), CHR(10), '\\n')),
-- regexp_replace(field_position (3),'[[:cntrl:]]',''),
--Translate(field_position (3), CHR(13),'') ,
field_position (4),
field_position (5),
field_position (6),
field_position (7),
field_position (8),
field_position (9),
field_position (10),
field_position (11),
field_position (12),
field_position (13),
field_position (14),
field_position (15),
field_position (16),
field_position (17),
field_position (18),
field_position (19),
field_position (20),
field_position (21),
field_position (22),
field_position (23),
field_position (24),
field_position (25),
field_position (26),
field_position (27),
field_position (28),
field_position (29),
field_position (30),
field_position (31),
field_position (32),
field_position (33),
field_position (34),
field_position (35),
field_position (36),
field_position (37),
field_position (38),
field_position (39),
field_position (40),
field_position (41),
field_position (42),
field_position (43),
field_position (44),
field_position (45),
field_position (46),
field_position (47),
field_position (48),
field_position (49),
field_position (50)
);
-- INSERT INTO DATA_FLU_TEST (COMMENTS)
-- VALUES ('---Import_DATA_File
-- ' || field_position (3));
COMMIT;
Total_Rec_Processed := Total_Rec_Processed + 1;
EXCEPTION
WHEN OTHERS
THEN
NULL;
END;
ELSE
V_TEMP_STR := V_TEMP_STR || Linebuf;
V_REC_POSTED := 1;
IF INSTR(Linebuf, '","', 1, 1) <> 0 AND V_MULTIREC_START = 1 THEN --End of multi-line record
SELECT REPLACE(REPLACE(V_TEMP_STR, CHR(10) , ''), CHR(13), '')
INTO Linebuf
FROM DUAL;
V_RECORD_COMP := 1;
V_REC_POSTED := 0;
V_MULTIREC_START := 0;
END IF;
V_MULTIREC_START := 1;
END IF;
END LOOP;
IF UTL_FILE.is_open (infile)
THEN
UTL_FILE.Fclose (Infile);
END IF;
EXCEPTION
WHEN NO_DATA_FOUND
THEN
IF UTL_FILE.is_open (infile)
THEN
UTL_FILE.Fclose (Infile);
END IF;
IF total_rec_processed > 0
THEN
COMMIT;
DELETE FROM DATA_Q127_CT
WHERE QUERY_REF = 'QUERY_REF'
OR COL_V50 = 'COL_V50';
COMMIT;
END IF;
WHEN OTHERS
THEN
IF UTL_FILE.is_open (infile)
THEN
UTL_FILE.Fclose (Infile);
END IF;
v_sql := 'Error in '|| $$plsql_unit || ' at ' || $$plsql_line;
o_msg := SQLERRM || '
'|| v_sql;
-- DBMS_OUTPUT.put_line(v_sql);
END;
Upvotes: 0
Reputation: 10360
If I understand the problem correctly, at the top of the loop where you call UTL_FILE.Get_Line (Infile,Linebuf);
you are trying to read in a user-entered text string (logically one field to go to one column) but that text string contains carriage returns which UTL_FILE.Get_Line
does not handle and which you need to strip out.
Can you get the source system to strip them out before you get the file or can you pre-process the file before reading it? Since you used the word "cell" I assume the data is coming from a spreadsheet. Can your data source or you search/replace there first? That would be preferable.
Otherwise you are going to have to add logic to detect you are at the start of a multi-line text field and handle that. Currently you read in a line and append to linebuf. One way you might handle it is to read the line into a tmplinebuf instead, than take a look at that. If it starts and ends with double-quotes (just assuming text fields are surrounded with quotes based on the only field you showed us) then it's complete so append THAT to the main linebuf and get the next line. However if it starts with a double quote but does not end with one you are in a multi-line field so keep reading into tmplinebuf until a line is encountered that does not start with a double-quote but ends with one. THEN the mutli-line field is contained with tmplinebuf so append tmplinebuf to the main linebuf and continue. Think of it like a subloop to read all lines of a mutli-line text field when one is encountered and build it in a separate buffer before adding it to your main linebuf. Something like that. Technically you could read the multi-line text right into the linebuf, you really don't need the tmplinebuf as long as you handle building the format correctly.
BIG CAVEAT the text can't contain double-quotes :-/ I highly recommend scrubbing either by the source or by pre-processing by you before reading.
EDIT: Another thought; could the source supply you with the length of the data in the cell (field)? If this was available to you, there is a third argument to GET_LINE() which is the length of data to read which supercedes GET_LINE() stopping at the end-of-line character. Read into a tmplinebuf, use REPLACE() to strip CHR(10)'s, append tmplinebuf to linebuf?
Upvotes: 3