Reputation: 5856
Because meantime i wrote an answer to the question, what got closed - trying to reword and re-ask it.
Having an CSV file with 180 milions records, with 5 columns as:
"c a","L G-3 (8) N (4th G P Q C- 4 R- 1 T H- 15.6 I- W 8.1) (B)","C & P_L",1,0
How to change it to the 3 column structure as:
"c a|L G-3 (8) N (4th G P Q C- 4 R- 1 T H- 15.6 I- W 8.1) (B)|C & P_L",1,0
e.g. need concatenate the colums 1,2,3 with |
and print it as one column and leave other colums unchanged
Tried it with regexes:
cat RelatedKW.csv | perl -pe 's/(\|)/\//g'| perl -pe 's/("\s*"|"\s*"\s*\\n$)//g'| perl -pe 's/^,"|,,|"\s*,\s*\"/|/g' | perl -pe 's/\"(\d+),(\d+)\"/ |$1|$2/g' > newRKW4.csv`
Is here any better way?
Upvotes: 1
Views: 85
Reputation: 1290
Assuming your data is exactly like what it is this should work
$line =~ s-\",\"-|-g;
Upvotes: 0
Reputation: 5856
You should generally avoid parsing CSVs with regex, as Kent Fredric explains in answer to another similar question:
Not using CPAN is really a recipe for disaster.
Please consider this before trying to write your own CSV implementation. Text::CSV is over a hundred lines of code, including fixed bugs and edge cases, and re-writing this from scratch will just make you learn how awful CSV can be the hard way.
It is really bad practice trying to parse CSVs with regexes, because for example, you need to handle:
and so on, all of which Text::CSV will handle for you.
Here's a solution that uses Text::CSV. I'm not a Perl expert, so the following code may be missing some things, but it is probably better than using regexes:
perl -MText::CSV_XS -E '$csv = Text::CSV_XS->new ({ eol => $/ }); $csv->print(*STDOUT, [join(q{|}, @$row[0..2]), @$row[3..4]]) while ($row = $csv->getline(*STDIN))' < csv
Input:
"c a","L G-3 (8) N (4th G P Q C- 4 R- 1 T H- 15.6 I- W 8.1) (B)","C & P_L",1,0
Output:
"c a|L G-3 (8) N (4th G P Q C- 4 R- 1 T H- 15.6 I- W 8.1) (B)|C & P_L",1,0
Some potential problems: it doesn't handles escaping of the |
character, if there are any in the input, no error handling, etc. For a better solution you need to write a full-featured Perl script and not a one-liner.
Upvotes: 1