Reputation: 365
My task is to read a CSV file from one location, do some manipulation in memory in dataframe and then place file at some other location.
The source file is '||' seperated, and target file has to be "," seperated.
I have do this for multiple files, with different columns.
In one of the source csv, one of the column contains new line char within the column.
example source CSV file:
id||notes<CR><LF>
1||notesLine1<CR><LF>
2||notesLine1<CR><LF>
notesLine2<CR><LF>
3||notesLine1: notesLine2<CR><LF>
note that line seperator is also and new line chars within the column 'note' is also . I cannot change the source, however I can have a mid layer in memory or disk if any modification is required.
code:
...
df_target = pd.read_csv(source_file, dtype = None, parse_dates= True, keep_default_na= False,header=None,sep="\|\|",engine='python', encoding='utf-8'))
df_target.to_csv(target_file,header=header_list,index=False,quoting=csv.QUOTE_ALL)
...
current output:
"id","notes"<CR><LF>
"1","notesLine1"<CR><LF>
"2","notesLine1"<CR><LF>
"notesLine2",""<CR><LF> -- extra unwanted row being created
"3","notesLine1: notesLine2"<CR><LF>
note the row is split into two, amking total rows to have 4 rows. I dont want this to happen!
expected output:
"id","notes"<CR><LF>
"1","notesLine1"<CR><LF>
"2","notesLine1 \n notesLine2",""<CR><LF>
"3","notesLine1: notesLine2"<CR><LF>
note: instead of split into two rows, I can have '\n' and data within same row. so that total rows are 3 and not 4.
Is there a way that this can be handled?
Upvotes: 0
Views: 583
Reputation: 831
See if this helps :
with open("sample.csv", 'r+') as file:
text = str();
for line in file:
if line[0].isdigit() == True:
text = "{}\n{}".format(text, line.strip())
else:
text = "{} {}".format(text, line.strip())
file.seek(0);
file.write(text[1:])
Upvotes: 0
Reputation: 831
CR and LF are control characters, respectively coded 0x0D (13 decimal) and 0x0A (10 decimal).
They are used to mark a line break in the file.
Upvotes: 1