Marjer
Marjer

Reputation: 1403

move delimted fields into Newline with different delimted[Updated Again]

I have the comma seperated txt file which must have five columns only, but the some of the rows have more than 5 column.
I want to move the 6th to 10th to a newline, and 11th to 15th to a newline so on. And 6th, 11th 16th so on column has space deleimert instead of comma

Below is the conetent of input.txt

111 1, 2, 3, 4, 5
11 2, 13, 14, 15 5, 16 11, 17, 18, 19, 20
22, 23, 24, 25, 26 22, 27, 28, 29, 21 30, 31, 32, 3333 3, 34

Below is the conetent of Output.txt

111 1, 2, 3, 4, 5
11 2, 13, 14, 15 5, 16
11, 17, 18, 19, 20
22, 23, 24, 25, 26
22, 27, 28, 29, 21
30, 31, 32, 3333 3, 34

Upvotes: 0

Views: 81

Answers (3)

Walter A
Walter A

Reputation: 20002

I did not have time to convert the script below from ksh to bash, I leave that as a practice:

#!/bin/ksh
splitline() {
   echo $* | IFS=\, read f1 f2 f3 f4 f5
   # remove first space
   fx=${f5# }
   echo ${fx} | read f5a f5b
   echo "${f1},${f2},${f3},${f4}, ${f5a}"
   if [[ -n "${f5b}" ]]; then
      splitline ${f5b}
   fi
}

cat input.txt | while read line; do
   splitline $line
done

Upvotes: 0

John1024
John1024

Reputation: 113834

In the current (third) version of this question, it looks like we need to count four commas and then one more entity which is followed by a space (not a comma) and then add a line break at that point. If this is the case, then use:

$ sed  's/\(\([^,]\+,\)\{4\}[[:space:]]\+[[:alnum:]]\+\)[[:space:]]/\1\n/g' input.txt
111 1, 2, 3, 4, 5
11 2, 13, 14, 15 5, 16
11, 17, 18, 19, 20
22, 23, 24, 25, 26
22, 27, 28, 29, 21
30, 31, 32, 3333 3, 34

If your sed supports the -r flag (GNU), the appearance of the command can be slightly improved:

sed -r 's/(([^,]+,){4}[[:space:]]+[[:alnum:]]+)[[:space:]]/\1\n/g' input.txt

On OSX, the -r flag is not supported by -E should work instead:

sed -E 's/(([^,]+,){4}[[:space:]]+[[:alnum:]]+)[[:space:]]/\1\n/g' input.txt


Solution for the second version of this question

From the examples given, we need to insert a newline every time that an column ends with a space instead of a comma. If that is the case, then:

$ sed 's/\>[[:space:]]/\n/g' input.txt
1, 2, 3, 4, 5
12, 13, 14, 15, 16
11, 17, 18, 19, 20
22, 23, 24, 25, 26
22, 27, 28, 29, 21
30, 31, 32, 33, 34

The above works by looking for the end of a word, which sed denotes by \>, followed by whitespace of any type. It then replaces that space with a newline. Columns which are followed instead by a comma are left alone.

The sed substitute command has the form s/old/new/ if we want to replace the first occurrence only or s/old/new/g if we want to replace all such occurrences. Since we want all occurrences replaced, we use the g. In the command above, the "old" part is \>[[:space:]] which means end-of-a-word followed by whitespace of any type. The "new" part is just \n which stands for a newline.

sed also allows the option of changing your file in place:

sed -i 's/\>[[:space:]]/\n/g' input.txt

The -i option tells sed to change the input file in place. After running this command, input.txt will be updated.

Upvotes: 1

Akshay Hegde
Akshay Hegde

Reputation: 16997

Try :

$ cat f1
1,2,3,4,5
12,13,14,15,16 11,17,18,19,20
22,23,24,25,26 22,27,28,29,21 30,31,32,33,34

$ awk '1' RS=' |\n' f1
1,2,3,4,5
12,13,14,15,16
11,17,18,19,20
22,23,24,25,26
22,27,28,29,21
30,31,32,33,34

USER UPDATED INPUT ABOVE SOLUTION WON'T WORK

$ cat f2
1, 2, 3, 4, 5
12, 13, 14, 15, 16 11, 17, 18, 19, 20
22, 23, 24, 25, 26 22, 27, 28, 29, 21 30, 31, 32, 33, 34

$ awk '{gsub(/, /,",");gsub(/ /,"\n");gsub(/,/,", ")}1'  f2

OR

$ awk '{gsub(/[[:alnum:]] /,"&\n")}1' f2

1, 2, 3, 4, 5
12, 13, 14, 15, 16
11, 17, 18, 19, 20
22, 23, 24, 25, 26
22, 27, 28, 29, 21
30, 31, 32, 33, 34

Answer to below comment

gsub(/, /,",")     # Substitute comma for comma + space

gsub(/ /,"\n")     # So now (field + space + field) is left, substitute space with newline  

gsub(/,/,", ")     # substitute comma space (as you requested in expected output) for comma (first argument)

Upvotes: 1

Related Questions