Reputation: 11
I'm facing issues when trying to split larger files into bunch of smaller ones where one column has new lines in them. In the CSV file that I'm trying to split, it has delimiters that are pipes (|) and each row is separated by newline (\n). Since 1 column has a bunch of newlines in it, it can cause that CSV file to look something like this:
col1 | col2 | col3| insert something in here
that is meaning
new documents
or formats
random text
text | col5 | col6 | col7
When splitting this, it can cause my document (if using either split by lines, or bytes) to split just in the middle of the col4. If that happens, the file is messed up and I am unable to process it later on to insert that data into my table.
I tried both using split and csplit but I am unsure I can achieve a good split based on the lines + delimiter. If I try to use csplit regex where it matches (| and newline), it would just pick up this: text | col5 | col6 | col7 -> so it wouldn't work either unfortunately.
Running out of solutions in here, maybe it is not possible with split and csplit at all but I'm open to suggestions. Thank you!
Upvotes: 1
Views: 245