user4993731
user4993731

Reputation:

How to treat a file use awk

I want to read a file using awk but I got stuck on fourth field where it automatically breaks after a comma.

Data:- test.txt

"A","B","ls","This,is,the,test"
"k","O","mv","This,is,the,2nd test"
"C","J","cd","This,is,the,3rd test"

cat test.txt | awk -F , '{ OFS="|" ;print $2 $3 $4 }'

output

"B"|"ls"|"This
"O"|"mv"|"This
"J"|"cd"|"This

But output should be like this

"B"|"ls"|"This,is,the,test"
"O"|"mv"|"This,is,the,2nd test"
"J"|"cd"|"This,is,the,3rd test"

Any idea

Upvotes: 1

Views: 220

Answers (5)

Claes Wikner
Claes Wikner

Reputation: 1517

awk '{sub(/^..../,"")gsub(/","/,"\042""|""\042")}1' file

"B"|"ls"|"This,is,the,test"
"O"|"mv"|"This,is,the,2nd test"
"J"|"cd"|"This,is,the,3rd test"

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203502

Use GNU awk for FPAT:

$ awk -v FPAT='([^,]+)|(\"[^\"]+\")' -v OFS='|' '{print $2,$3,$4}' file
"B"|"ls"|"This,is,the,test"
"O"|"mv"|"This,is,the,2nd test"
"J"|"cd"|"This,is,the,3rd test"

See http://www.gnu.org/software/gawk/manual/gawk.html#Splitting-By-Content

With other awks you'd do:

$ cat tst.awk
BEGIN { OFS="|" }
{
    nf=0
    delete f
    while ( match($0,/([^,]+)|(\"[^\"]+\")/) ) {
        f[++nf] = substr($0,RSTART,RLENGTH)
        $0 = substr($0,RSTART+RLENGTH)
    }
    print f[2], f[3], f[4]
}

$ awk -f tst.awk file
"B"|"ls"|"This,is,the,test"
"O"|"mv"|"This,is,the,2nd test"
"J"|"cd"|"This,is,the,3rd test"

Upvotes: 2

Rakholiya Jenish
Rakholiya Jenish

Reputation: 3223

Using awk, you can also use:

awk -F'\",\"' 'BEGIN{OFS="\"|\""}{print "\""$2,$3,$4}' filename

Note: This will only work assuming "," is not found in between the string. That is it is used as field separator.

Output:

"B"|"ls"|"This,is,the,test"
"O"|"mv"|"This,is,the,2nd test"
"J"|"cd"|"This,is,the,3rd test"

OR

somewhat better:

awk -F'^\"|\",\"|\"$' 'BEGIN{OFS="\"|\""}{print "\""$3,$4,$5"\""}' filename

Upvotes: 5

csiu
csiu

Reputation: 3269

In awk:

awk -F'"' '{for(i=4;i<=9;i+=2) {if(i==4){s="\""$i"\""}else{s = s "|\"" $i"\""}}; print s}' test.txt

Explanation

  • -F'"' to denote comma separated fields
  • awk explanation:

    {
    ## use for-loop to go over fields
    ## skips the comma field (i.e. increment by +2)
    ## OP wanted to start at field 2, this means the 4th term
    ## OP wanted to end at field 4, this means the 8th term
    for(i=4;i<=8;i+=2) {
    
        if(i==4){
            ## initialization
            ## use variable s to hold output (i.e. quoted first field $i)
            s="\"" $i "\""
        } else {
            ## for rest of field $i,
            ## prepend '|' and add quotes around $i
            s = s "|\"" $i "\""
        }
    };
    
    ## print output
    print s 
    }
    

Upvotes: 1

Birei
Birei

Reputation: 36262

I don't like much for this kind of task. My suggestion is to use a parser, for example, has a built-in module to handle this. You can use it like:

import csv
import sys

with open(sys.argv[1], 'r') as csvfile:
    csvreader = csv.reader(csvfile)
    csvwriter = csv.writer(sys.stdout, delimiter='|', quoting=csv.QUOTE_ALL)
    for row in csvreader:
        csvwriter.writerow(row[1:])

And run it like:

python3 script.py infile

That yields to stdout:

"B"|"ls"|"This,is,the,test"
"O"|"mv"|"This,is,the,2nd test"
"J"|"cd"|"This,is,the,3rd test"

Upvotes: 0

Related Questions