MJCS
MJCS

Reputation: 139

Adding a new line to a text file after 5 occurrences of a comma in Bash

I have a text file that is basically one giant excel file on one line in a text file. An example would be like this:

Name,Age,Year,Michael,27,2018,Carl,19,2018

I need to change the third occurance of a comma into a new line so that I get

Name,Age,Year 
Michael,27,2018 
Carl,19,2018

Please let me know if that is too ambiguous and as always thank you in advance for all the help!

Upvotes: 0

Views: 256

Answers (5)

potong
potong

Reputation: 58371

This might work for you (GNU sed):

sed 's/,/\n/3;P;D' file

Replace every third , with a newline, print ,delete the first line and repeat.

Upvotes: 0

Walter A
Walter A

Reputation: 19982

You are looking for 3 fragments, each without a comma and separated by a comma. The last fields can give problems (not ending with a comma and mayby only two fields.
The next command looks fine.

grep -Eo "([^,]*[,]{0,1}){0,3}" inputfile

Upvotes: 0

rici
rici

Reputation: 241681

With Gnu sed:

sed -E 's/(([^,]*,){2}[^,]*),/\1\n/g'

To change the number of fields per line, change {2} to one less than the number of fields. For example, to change every fifth comma (as in the title of your question), you would use:

sed -E 's/(([^,]*,){4}[^,]*),/\1\n/g'

In the regular expression, [^,]*, is "zero or more characters other than , followed by a ,; in other words, it is a single comma-delimited field. This won't work if the fields are quoted strings with internal commas or newlines.

Regardless of what Linux's man sed says, the -E flag is an extension to Posix sed, which causes sed to use extended regular expressions (EREs) rather than basic regular expressions (see man 7 regex). -E also works on BSD sed, used by default on Mac OS X. (Thanks to @EdMorton for the note.)

Upvotes: 2

Ed Morton
Ed Morton

Reputation: 203219

With GNU awk for multi-char RS:

$ awk -v RS='[,\n]' '{ORS=(NR%3 ? "," : "\n")} 1' file
Name,Age,Year
Michael,27,2018
Carl,19,2018

With any awk:

$ awk -v RS=',' '{sub(/\n$/,""); ORS=(NR%3 ? "," : "\n")} 1' file
Name,Age,Year
Michael,27,2018
Carl,19,2018

Upvotes: 1

AKS
AKS

Reputation: 17316

Try this:

$ cat /tmp/22.txt
Name,Age,Year,Michael,27,2018,Carl,19,2018,Nooka,35,1945,Name1,11,19811

$ echo "Name,Age,Year"; grep -o "[a-zA-Z][a-zA-Z0-9]*,[1-9][0-9]*,[1-9][0-9]\{3\}" /tmp/22.txt
Michael,27,2018
Carl,19,2018
Nooka,35,1945
Name1,11,1981

Or, ,[1-9][0-9]\{3\} if you don't want to put [0-9] 3 more times for the YYYY part.

PS: This solution will give you only YYYY for the year (even if the data for YYYY is 19811 (typo mistakes if any), you'll still get 1981

Upvotes: 0

Related Questions