Nurdin Ibrisimovic
Nurdin Ibrisimovic

Reputation: 11

Is there a way to make split / csplit work on linux system with columns that have newline in them?

I'm facing issues when trying to split larger files into bunch of smaller ones where one column has new lines in them. In the CSV file that I'm trying to split, it has delimiters that are pipes (|) and each row is separated by newline (\n). Since 1 column has a bunch of newlines in it, it can cause that CSV file to look something like this:

col1 | col2 | col3| insert something in here

that is meaning

new documents

or formats

random text

text | col5 | col6 | col7

When splitting this, it can cause my document (if using either split by lines, or bytes) to split just in the middle of the col4. If that happens, the file is messed up and I am unable to process it later on to insert that data into my table.

I tried both using split and csplit but I am unsure I can achieve a good split based on the lines + delimiter. If I try to use csplit regex where it matches (| and newline), it would just pick up this: text | col5 | col6 | col7 -> so it wouldn't work either unfortunately.

Running out of solutions in here, maybe it is not possible with split and csplit at all but I'm open to suggestions. Thank you!

Upvotes: 1

Views: 245

Answers (0)

Related Questions