JBoy
JBoy

Reputation: 5735

bash / awk inner deletion

I need a little advice/help with this bash line, which i'm trying to accomplish using awk,

Basically, i have a variable holding comma separated values, like so:

"abc,abd,abf,abz,abz"

Getting each field is very easy with a simple awk loop

echo ${var} | awk -F"," '{for(i=1;i<=NF;i++){print $i}}'

The problem is that sometime these comma separated values contain a string, with comma in the middle, e.g:

"abc,"abd,abf,abz",abh,abr,alk"

In this case "abd,abf,abz" is a single value, i need to tell awk that whats between quotes has to be treated as whole value and not to be separated but i get nowhere, Any advice?

Upvotes: 1

Views: 141

Answers (4)

Paul-Beyond
Paul-Beyond

Reputation: 1737

Check out the csvtool program that enables you to manipulate CSV files.

It can be installed with apt-get (or with whatever your package manager is) and used in your Bash files to work with CSV files.

Upvotes: 0

Chris Seymour
Chris Seymour

Reputation: 85785

Firstly you don't need to loop at all for the first example:

$ awk '{print}' RS=',' <<< 'abc,abd,abf,abz,abz'
abc
abd
abf
abz
abz

For the second example you really want a proper CSV parser. Here is a python solution:

#!/usr/bin/env python
from csv import reader, writer
from sys import stdin, stdout
writer(stdout, delimiter='\n').writerows(reader(stdin))

Demo:

$ cat file
abc,"abd,abf,abz",abh,abr,alk

$ csv_delimiter.py < file 
abc
abd,abf,abz
abh
abr
alk

Upvotes: 1

Taoufix
Taoufix

Reputation: 372

The best I could do with awk:

$ echo 'abc,"xxx,yyy,zzz",abh,abr,alk' | awk -F'"' '{
    for(i=1;i<=NF;i++) {
      if (i %2 == 0) {
        printf "\""$i"\"";
      } else {
        n=split($i,array,",");
        for (j=1; j<n; j++) {
          print array[j];
        }
      }
    }
  }'
abc
"xxx,yyy,zzz"
abh
abr
alk

This does give empty lines though :(, I'm still trying to find out why.

Update: Fixed + indented

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203512

If the first/last double quotes you show in your sample input are actually not present in your input then:

$ echo 'abc,"abd,abf,abz",abh,abr,alk' |
awk -F\" '{
    for (i=1;i<=NF;i++) {
        if (i%2) {
            gsub(/^,|,$/,"",$i)
            nf = split($i,a,/,/)
            for (j=1; j<=nf; j++) {
                print a[j]
            }
        }
        else {
            print $i
        }
    }
}'
abc
abd,abf,abz
abh
abr
alk

If they are present then:

$ echo '"abc,"abd,abf,abz",abh,abr,alk"' |
awk -F\" '{
    for (i=2;i<NF;i++) {
        if ( !(i%2) ) {
            gsub(/^,|,$/,"",$i)
            nf = split($i,a,/,/)
            for (j=1; j<=nf; j++) {
                print a[j]
            }
        }
        else {
            print $i
        }
    }
}'
abc
abd,abf,abz
abh
abr
alk

Upvotes: 1

Related Questions