saurav
saurav

Reputation: 219

awk in script to copy data to a file

I have copied some columns data to some file and then tried to write one column data to another file. But am getting few wrong

This is my input file:-

,E2Bn9,2015-04-29 00:00:00-0500

['2C173'],E2BA8,2015-04-29 00:00:00-0500

['5A475','2C174'],E2BA8,2015-06-29 00:00:00-0400

I used the awk, sed commands as follows

sed -i 's/",/|/g' tempFile
awk -F '[|,]' '{ print "update table set cola = " $1 " where colb = " $2 " and colc = " $3 }' tempFile > updatestmt.cql

I got the output as

update table set cola = where colb = E2Bn9 and colc = 2015-04-29 00:00:00-0500

update table set cola = ['2C173'] where colb = E2BA8 and colc = 2015-04-29 00:00:00-0500

update table set cola = "['5A475' where colb =  '2C174'] and colc = E2BA8

1st two rows seems fine but last row it is printing wrong value.

I want the last row as

update table set cola = "['5A475','2C174'] where colb =E2BA8 and colc = 2015-06-29 00:00:00-0400

Upvotes: 0

Views: 317

Answers (4)

I chose a different approach, so I could avoid using too complex reg-exp and it works with any old awk.

# cat tst.awk
        {s="";}
$1!=""  {for(i=1;i<NF-1;i++)s=s (i==1?"":",") $i;}
        {printf("update table set cola = %s where colb = %s and colc = %s\n",s,$(NF-1),$NF);}

# awk -F, -f tst.awk yourinpfile
update table set cola =  where colb = E2Bn9 and colc = 2015-04-29 00:00:00-0500
update table set cola = ['2C173'] where colb = E2BA8 and colc = 2015-04-29 00:00:00-0500
update table set cola = ['5A475','2C174'] where colb = E2BA8 and colc = 2015-06-29 00:00:00-0400

I agree with Ed that without loop we have a nicer solution, but I can reuse my original assumption that $(NF-1) and $NF is fix, to keep the simpler reg-exp.

        {s="";}
$1!=""  {s=$0;sub("," $(NF-1) "," $NF, "", s);}
        {printf("update table set cola = %s where colb = %s and colc = %s\n",s,$(NF-1),$NF);}

Upvotes: 1

SLePort
SLePort

Reputation: 15461

If only list values are quoted as in your sample code, you can try this sed ;

sed "s/' *, *'/' '/g;s/\([^,]*\),\([^,]*\),\(.*\)/update table set cola = \1 where colb = \2 and colc = \3/;s/' '/','/g" file

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203532

With GNU awk 4.* for FPAT:

$ awk -v FPAT='([^,]*)|([[][^]]+[]])' '{print "update table set cola =", $1, "where colb =", $2, "and colc =", $3}' file
update table set cola =  where colb = E2Bn9 and colc = 2015-04-29 00:00:00-0500
update table set cola = ['2C173'] where colb = E2BA8 and colc = 2015-04-29 00:00:00-0500
update table set cola = ['5A475','2C174'] where colb = E2BA8 and colc = 2015-06-29 00:00:00-0400

See http://www.gnu.org/software/gawk/manual/gawk.html#Splitting-By-Content.

With non-gawk awks or pre-4.0 versions of gawk (get a modern gawk!) you can use:

$ cat tst.awk
{
    delete f
    nf = 0
    tail = $0
    while ( (tail!="") && match(tail,/([^,]*)|([[][^]]+[]])/) ) {
        f[++nf] = substr(tail,RSTART,RLENGTH)
        tail = substr(tail,RSTART+RLENGTH+1)
    }
    print "update table set cola =", f[1], "where colb =", f[2], "and colc =", f[3]
}

$ awk -f tst.awk file
update table set cola =  where colb = E2Bn9 and colc = 2015-04-29 00:00:00-0500
update table set cola = ['2C173'] where colb = E2BA8 and colc = 2015-04-29 00:00:00-0500
update table set cola = ['5A475','2C174'] where colb = E2BA8 and colc = 2015-06-29 00:00:00-0400

You can use $0 instead of f[] but then there's a performance overhead as the record gets re-split every time you assign to $(++nf) and there may be cases where you want to use the original $0 later.

Upvotes: 4

Lars Fischer
Lars Fischer

Reputation: 10149

The field separators in your data are causing the problems, to be precise the comma inside the brackets in the third line. A workaround could be a different sed that only converts , to | outside of the first bracket and using FS='|':

sed -r 's/(.*\])?.*,/\1|/g'  yourfile | awk -F '|' ....

where .... stand for the rest of your awk script.

Upvotes: 0

Related Questions