Reputation: 87
I have an table with multiple columns and I would like to skip the first and second column. Then I would like to change the delimiter for the (n + 2 columns) from tab to comma. Any suggestions with awk or sed with be helpful definitely, because normally I could just a find and replace(with sed) but I keep replacing the tabs in the first two columns.
An example test set:
M1 D.130 a a a
M2 D.104 - a ab
M3 D.150 ab ab a
M4 D.160 a a -
M5 D.107 a ab a
M6 D.107 - ab -
M7 D.104 a ab ab
Desired Output:
M1 D.130 a,a,a
M2 D.104 -,a,ab
M3 D.150 ab,ab,a
M4 D.160 a,a,-
M5 D.107 a,ab,a
M6 D.107 -,ab,-
M7 D.104 a,ab,ab
Upvotes: 1
Views: 160
Reputation: 203532
With GNU awk for the 3rd arg to match():
$ awk 'match($0,/(([^\t]+\t){2})(.*)/,a) {gsub(/\t/,",",a[3]); print a[1] a[3]}' file
M1 D.130 a,a,a
M2 D.104 -,a,ab
M3 D.150 ab,ab,a
M4 D.160 a,a,-
M5 D.107 a,ab,a
M6 D.107 -,ab,-
M7 D.104 a,ab,ab
With any awk:
$ awk 'match($0,/([^\t]+\t){2}/) {r=substr($0,RLENGTH+1); gsub(/\t/,",",r); print substr($0,1,RLENGTH) r}' file
M1 D.130 a,a,a
M2 D.104 -,a,ab
M3 D.150 ab,ab,a
M4 D.160 a,a,-
M5 D.107 a,ab,a
M6 D.107 -,ab,-
M7 D.104 a,ab,ab
Upvotes: 1
Reputation: 92854
Simple awk approach:
awk -F'\t' '{ r=$1 FS $2 FS $3; for(i=4;i<=NF;i++) r=r","$i; print r }' file
The output:
M1 D.130 a,a,a
M2 D.104 -,a,ab
M3 D.150 ab,ab,a
M4 D.160 a,a,-
M5 D.107 a,ab,a
M6 D.107 -,ab,-
M7 D.104 a,ab,ab
Upvotes: 0
Reputation: 16997
For given input, you can simply use below
awk '{print $1, $2, $3 "," $4 "," $5}' infile
Or else
awk -v n=3 '{for(i=1; i<=NF; i++)printf("%s%s",$i,i==NF?ORS:i<n?OFS:",")}' infile
With gawk
to retain original spacing:
awk -v n=3 '{
split($0,t,FS,d);
for(i=1; i<=NF; i++)
printf("%s%s",$i,i==NF?ORS:i<n?d[i]:",")
}' infile
non-gawk to retain original spacing:
awk -v n=3 '{
split($0,d,/[^[:space:]]*/);
for(i=1; i<=NF; i++)
printf("%s%s",$i,i==NF?ORS:i<n?d[i+1]:",")
}' infile
For Example :
$ cat infile
M1 D.130 a a a
M2 D.104 - a ab
M3 D.150 ab ab a
M4 D.160 a a -
M5 D.107 a ab a
M6 D.107 - ab -
M7 D.104 a ab ab
$ awk -v n=3 '{for(i=1; i<=NF; i++)printf("%s%s",$i,i==NF?ORS:i<n?OFS:",")}' file
M1 D.130 a,a,a
M2 D.104 -,a,ab
M3 D.150 ab,ab,a
M4 D.160 a,a,-
M5 D.107 a,ab,a
M6 D.107 -,ab,-
M7 D.104 a,ab,ab
With Gawk
to retain original spacing
$ awk -v n=3 '{split($0,t,FS,d);for(i=1; i<=NF; i++)printf("%s%s",$i,i==NF?ORS:i<n?d[i]:",")}' infile
M1 D.130 a,a,a
M2 D.104 -,a,ab
M3 D.150 ab,ab,a
M4 D.160 a,a,-
M5 D.107 a,ab,a
M6 D.107 -,ab,-
M7 D.104 a,ab,ab
With non-Gawk
to retain original spacing
$ awk -v n=3 '{split($0,d,/[^[:space:]]*/);for(i=1; i<=NF; i++)printf("%s%s",$i,i==NF?ORS:i<n?d[i+1]:",")}' infile
M1 D.130 a,a,a
M2 D.104 -,a,ab
M3 D.150 ab,ab,a
M4 D.160 a,a,-
M5 D.107 a,ab,a
M6 D.107 -,ab,-
M7 D.104 a,ab,ab
Upvotes: 3
Reputation: 3089
You can use this awk command :
$ awk '{for(i=3; i<NF; i++){a=a$i","} {OFS=" "; print $1,$2,a$NF} a=""}' file
M1 D.130 a,a,a
M2 D.104 -,a,ab
M3 D.150 ab,ab,a
M4 D.160 a,a,-
M5 D.107 a,ab,a
M6 D.107 -,ab,-
M7 D.104 a,ab,ab
Upvotes: 0