Teflon Ted
Teflon Ted

Reputation: 8856

Ignoring escaped delimiters (commas) with awk?

If I had a string with escaped commas like so:

a,b,{c\,d\,e},f,g

How might I use awk to parse that into the following items?

a
b
{c\,d\,e}
f
g

Upvotes: 0

Views: 1620

Answers (3)

system PAUSE
system PAUSE

Reputation: 38550

{
   split($0, a, /,/)
   j=1
   for(i=1; i<=length(a); ++i) {
      if(match(b[j], /\\$/)) {
         b[j]=b[j] "," a[i]
      } else {
         b[++j] = a[i]
      }
   }
   for(k=2; k<=length(b); ++k) {
      print b[k]
   }
}
  1. Split into array a, using ',' as delimiter
  2. Build array b from a, merging lines that end in '\'
  3. Print array b (Note: Starts at 2 since first item is blank)

This solution presumes (for now) that ',' is the only character that is ever escaped with '\'--that is, there is no need to handle any \\ in the input, nor weird combinations such as \\\,\\,\\\\,,\,.

Upvotes: 2

Cascabel
Cascabel

Reputation: 497302

I don't think awk has any built-in support for something like this. Here's a solution that's not nearly as short as DigitalRoss's, but should have no danger of ever accidentally hitting your made-up string (!Q!). Since it tests with an if, you could also extend it to be careful about whether you actually have \\, at the end of your string, which should be an escaped slash, not comma.

BEGIN {
    FS = ","
}

{
    curfield=1
    for (i=1; i<=NF; i++) {
        if (substr($i,length($i)) == "\\") {
            fields[curfield] = fields[curfield] substr($i,1,length($i)-1) FS
        } else {
            fields[curfield] = fields[curfield] $i
            curfield++
        }
    }
    nf = curfield - 1
    for (i=1; i<=nf; i++) {
        printf("%d: %s   ",i,fields[i])
    }
    printf("\n")
}

Upvotes: 1

DigitalRoss
DigitalRoss

Reputation: 146141

{
  gsub("\\\\,", "!Q!")
  n = split($0, a, ",")
  for (i = 1; i <= n; ++i) {
    gsub("!Q!", "\\,", a[i])
    print a[i]
  }
}

Upvotes: 2

Related Questions