Reputation: 359

Parse a tab separated file that contains commas in the fields with awk

I would like to remove all values after ":" in 2nd field of the following input file using awk.

Input

text1   [a:2,b:1,c:4,k:0]
text2   [d:1,a:5,f:3.2]

Output

text1   a,b,c,k
text2   d,a,f

I was able to do this using R but that was kind of slow on larger files.

Upvotes: 1

Answers (2)

glenn jackman

Reputation: 246754

Another approach:

awk '{
  printf "%s ", $1
  n = split($2, a, /[][,:]/)
  sep=""
  for (i=2; i<n; i+=2) {
    printf "%s%s", sep, a[i]
    sep=","
  }
  print ""
}' <<END
text1   [a:2,b:1,c:4,k:0]
text2   [d:1,a:5,f:3.2]
END

text1 a,b,c,k
text2 d,a,f

Upvotes: 1

jaypal singh

Reputation: 77085

Using sed (assuming you want to remove brackets too):

$ sed 's/\[\|:[^,]*//g' file
text1   a,b,c,k
text2   d,a,f

Using awk:

$ awk '{gsub(/\[|:[^,]*/,"")}1' file
text1   a,b,c,k
text2   d,a,f

Upvotes: 7

Parse a tab separated file that contains commas in the fields with awk

Answers (2)

Related Questions