RebeA
RebeA

Reputation: 11

Formatting data using AWK

File Content

662293,211,sname in ('Market District', 'Express', 'Market', 'Market District Express')

62871,3506,RTANAME in ('ALLIANCE TA')

AWK command Used

awk '{for (i=2; i<NF; i++) printf $i " "; print $NF}'

Output Obtained

in ('Market District', 'Express', 'Market', 'Market District Express')

in ('ALLIANCE TA')

Can you guide me to on how I can use AWK to get the following format in the output

Expected Output

sname in ('Market District', 'Express', 'Market', 'Market District Express')

RTANAME in ('ALLIANCE TA')

Upvotes: 1

Views: 618

Answers (2)

Renaud Pacalet
Renaud Pacalet

Reputation: 29403

You simply forgot to tell awk that the field separator is a comma instead of the default spaces. Moreover, awk fields are numbered starting from 1, not 0. So, if you want to skip the 2 first fields, you must start your for loop at i=3:

awk -F, '{for(i=3; i<NF; i++) printf "%s," $i; print $NF}'

Note that once you told awk that the comma is the field separator, printf $i will not print the comma any more. This is why, to preserve them, I replaced your:

printf $i " "

by:

printf "%s," $i

Another option, if your awk supports this (non-POSIX), consists in shifting all fields by two positions, discarding the 2 last ones, and using the comma also as the output separator (OFS):

awk -F, 'NF>=2{for(i=1;i<=NF-2;i++)$i=$(i+2);NF-=2}1' OFS=,

But it's probably not easier to understand (the final 1 is a strange awk-ism, it is the same as {print}).

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 204731

As @RenaudPacalet already mentioned the main problem with your script was not setting FS, but you additionally were using printf $i instead of printf "%s", $i and that would fail if your input contained printf formatting chars such as %s, and you were using " " as the output field separator instead of using/print OFS. Here's how to write your code correctly using any awk:

$ awk 'BEGIN{FS=OFS=","} {for (i=3; i<NF; i++) printf "%s%s", $i, OFS; print $NF}' file
sname in ('Market District', 'Express', 'Market', 'Market District Express')
RTANAME in ('ALLIANCE TA')

but more idiomatically printing a set of values in a loop is written as:

$ awk 'BEGIN{FS=OFS=","} {for (i=3; i<=NF; i++) printf "%s%s", $i, (i<NF ? OFS : ORS)}' file
sname in ('Market District', 'Express', 'Market', 'Market District Express')
RTANAME in ('ALLIANCE TA')

so you don't have to specify the printf formatting string twice (no big deal in your case but can be an issue for other formatting strings) - once for everything before NF and then again for NF.

In your case though, the right thing to do if you were using awk is just remove the first 2 comma-separated fields with a regexp:

$ awk '{sub(/([^,]*,){2}/,"")} 1' file
sname in ('Market District', 'Express', 'Market', 'Market District Express')

RTANAME in ('ALLIANCE TA')

which would be more appropriate for sed as it's a simple substitution:

$ sed -E 's/([^,]*,){2}//' file
sname in ('Market District', 'Express', 'Market', 'Market District Express')

RTANAME in ('ALLIANCE TA')

but in reality, this is a job for cut:

$ cut -d, -f3- file
sname in ('Market District', 'Express', 'Market', 'Market District Express')

RTANAME in ('ALLIANCE TA')

Upvotes: 3

Related Questions