Reputation: 7
I was trying to pick specific columns from my CSV file and print it. I wanted to print all the columns which the headers have "Unposted Final Score" string in them.
I could tailor two pieces of code. One piece is giving me all the correct headers but missing the columns' body. The other piece is giving me only the first full column with the correct header.
I am new to AWK and did my best to tailor something to give me both but I was unsuccessful.
Would any one guide me on this please?
Cheers B :)
Please see my codes here:
This is the code that gives me only the correct headers and not the body of the columns:
{for(i=1;i<=NF;i++)
{if ($i ~ /Unposted Final Score/)
{print $i}
}
}
output:
A1. Unposted Final Score
A2. Proposal and Storyboard Unposted Final Score
=================
This is the code that gives me the first columns and not the rest with the "Unposted Final Score" in them:
{ for (i=1;i<=NF;++i) if ($i ~ /Unposted Final Score/) { n=i; break }} { print $n }
output:
A1. Unposted Final Score
56
95
90
93
0
80
61
=====================
My Current CSV file:
ID,Section,A1. Final Score,A1. Unposted Final Score,A2. Current Score,A2. Proposal and Storyboard Unposted Final Score
4836,Sydney A,1,56,,34
5376,Sydney A,2,95,0,1
4760,Sydney A,3,90,,30
4675,Sydney A,4,93,3,0
4873,Sydney B,0,0,33,50
4848,Sydney A,80,80,0,0
4755,Sydney A,61,61,,0
The Preferred CSV file:
A1. Unposted Final Score,A2. Proposal and Storyboard Unposted Final Score
56,34
95,1
90,30
93,0
0,50
80,0
61,0
Upvotes: 0
Views: 1711
Reputation: 28965
The following was tested with GNU awk
5.1.0 in POSIX mode and with awk
20070501 that comes with macOS. It could be that it doesn't work at all with other versions of awk
.
Your first attempt is a good starting point. Just do something similar on the first line to record the indexes of the columns to print and then, on all records including the first, print only the corresponding fields:
$ cat foo.awk
NR==1 {for(i=1;i<=NF;i++) if($i~/Unposted Final Score/) idx[++n]=i}
{for(i=1;i<=n;i++) printf("%s%s", $(idx[i]), (i==n)?ORS:",")}
$ awk -f foo.awk -F, foo.csv
A1. Unposted Final Score,A2. Proposal and Storyboard Unposted Final Score
56,34
95,1
90,30
93,0
0,50
80,0
61,0
Explanations:
We set the input field separator to comma, instead of the default (spaces), with the -F,
option.
The NR==1
condition is true only for the first record (the header). In this block we store all field indexes matching Unposted Final Score
in the idx
array. Variable n
is used to store the number of matching fields, we will use it for the printing of the output. Instead of idx[++n]=i
we could use {n=n+1; idx[n]=i}
which is probably easier to understand (the numeric value of a variable is automatically initialized to 0).
The third block is executed on all lines (no condition). It prints all fields which index is in the idx
array, followed by a comma, except for the last printed field which is followed by the output record separator (ORS
, a newline by default).
Upvotes: 0
Reputation: 203229
Using any awk in any shell on every Unix box the following will robustly, efficiently, and portably print the fields you want in the same order they occur in the input:
$ cat tst.awk
BEGIN { FS=OFS="," }
NR==1 {
for (i=1; i<=NF; i++) {
if ( $i ~ /Unposted Final Score/ ) {
out2in[++numOutFlds] = i
}
}
}
{
for (o=1; o<=numOutFlds; o++) {
i = out2in[o]
printf "%s%s", $i, (o<numOutFlds ? OFS : ORS)
}
}
$ awk -f tst.awk file
A1. Unposted Final Score,A2. Proposal and Storyboard Unposted Final Score
56,34
95,1
90,30
93,0
0,50
80,0
61,0
See also sh Break CSV files by value of column name matched while retaining header.
Upvotes: 2
Reputation: 181280
Assuming your data is in a file called u.csv
, you could simply try this:
cat u.csv | awk -F',' '{printf $4 "," $6 "\n" }'
It will give you this output:
A1. Unposted Final Score,A2. Proposal and Storyboard Unposted Final Score
56,34
95,1
90,30
93,0
0,50
80,0
61,0
Explanation:
awk -F','
Will split your lines using ,
as a separator.
I.E., the first line:
4836,Sydney A,1,56,,34
So, for awk
the values will be stored in:
$1 -> 4836
$2 -> Sydney A
$3 -> 1
$4 -> 54
$5 ->
$6 -> 34
After that, you just use awk
's built in printf
function to make the output what you said you wanted.
Upvotes: 0