Reputation:
I want to read a file using awk
but I got stuck on fourth field where it automatically breaks after a comma.
Data:- test.txt
"A","B","ls","This,is,the,test"
"k","O","mv","This,is,the,2nd test"
"C","J","cd","This,is,the,3rd test"
cat test.txt | awk -F , '{ OFS="|" ;print $2 $3 $4 }'
output
"B"|"ls"|"This
"O"|"mv"|"This
"J"|"cd"|"This
But output should be like this
"B"|"ls"|"This,is,the,test"
"O"|"mv"|"This,is,the,2nd test"
"J"|"cd"|"This,is,the,3rd test"
Any idea
Upvotes: 1
Views: 220
Reputation: 1517
awk '{sub(/^..../,"")gsub(/","/,"\042""|""\042")}1' file
"B"|"ls"|"This,is,the,test"
"O"|"mv"|"This,is,the,2nd test"
"J"|"cd"|"This,is,the,3rd test"
Upvotes: 0
Reputation: 203502
Use GNU awk for FPAT:
$ awk -v FPAT='([^,]+)|(\"[^\"]+\")' -v OFS='|' '{print $2,$3,$4}' file
"B"|"ls"|"This,is,the,test"
"O"|"mv"|"This,is,the,2nd test"
"J"|"cd"|"This,is,the,3rd test"
See http://www.gnu.org/software/gawk/manual/gawk.html#Splitting-By-Content
With other awks you'd do:
$ cat tst.awk
BEGIN { OFS="|" }
{
nf=0
delete f
while ( match($0,/([^,]+)|(\"[^\"]+\")/) ) {
f[++nf] = substr($0,RSTART,RLENGTH)
$0 = substr($0,RSTART+RLENGTH)
}
print f[2], f[3], f[4]
}
$ awk -f tst.awk file
"B"|"ls"|"This,is,the,test"
"O"|"mv"|"This,is,the,2nd test"
"J"|"cd"|"This,is,the,3rd test"
Upvotes: 2
Reputation: 3223
Using awk
, you can also use:
awk -F'\",\"' 'BEGIN{OFS="\"|\""}{print "\""$2,$3,$4}' filename
Note: This will only work assuming ","
is not found in between the string. That is it is used as field separator.
Output:
"B"|"ls"|"This,is,the,test"
"O"|"mv"|"This,is,the,2nd test"
"J"|"cd"|"This,is,the,3rd test"
OR
somewhat better:
awk -F'^\"|\",\"|\"$' 'BEGIN{OFS="\"|\""}{print "\""$3,$4,$5"\""}' filename
Upvotes: 5
Reputation: 3269
In awk
:
awk -F'"' '{for(i=4;i<=9;i+=2) {if(i==4){s="\""$i"\""}else{s = s "|\"" $i"\""}}; print s}' test.txt
Explanation
-F'"'
to denote comma separated fieldsawk explanation:
{
## use for-loop to go over fields
## skips the comma field (i.e. increment by +2)
## OP wanted to start at field 2, this means the 4th term
## OP wanted to end at field 4, this means the 8th term
for(i=4;i<=8;i+=2) {
if(i==4){
## initialization
## use variable s to hold output (i.e. quoted first field $i)
s="\"" $i "\""
} else {
## for rest of field $i,
## prepend '|' and add quotes around $i
s = s "|\"" $i "\""
}
};
## print output
print s
}
Upvotes: 1
Reputation: 36262
I don't like awk much for this kind of task. My suggestion is to use a csv parser, for example, python has a built-in module to handle this. You can use it like:
import csv
import sys
with open(sys.argv[1], 'r') as csvfile:
csvreader = csv.reader(csvfile)
csvwriter = csv.writer(sys.stdout, delimiter='|', quoting=csv.QUOTE_ALL)
for row in csvreader:
csvwriter.writerow(row[1:])
And run it like:
python3 script.py infile
That yields to stdout
:
"B"|"ls"|"This,is,the,test"
"O"|"mv"|"This,is,the,2nd test"
"J"|"cd"|"This,is,the,3rd test"
Upvotes: 0