Reputation: 747
Let's say I have a file test.txt
that contains
a,b,c,d,e
1,2,3,4,5
6,7,8,9,10
I want to print out columns based on matching column names, either from another text file or from an array. So for example if I was given
arr=(a b c)
I want my output to then be
a,b,c
1,2,3
6,7,8
How can I do this with bash utilities/awk/sed? My actual text file is 3GB (and the line I want to match column values on is actually line 3), so efficient solutions are appreciated. This is what I have so far:
for j in "${arr[@]}"; do awk -F ',' -v a=$j '{ for(i=1;i<=NF;i++) {if($i==a) {print $i}}}' test.txt; done
but the output I get is
a
b
c
which not only is missing the other rows, but each column name is printed on one line each.
Upvotes: 1
Views: 387
Reputation: 133458
With your shown samples, please try following. Code is reading 2 files another_file.txt(which has a b c
as per samples) and actual Input_file named test.txt(which has all the values in it).
awk '
FNR==NR{
for(i=1;i<=NF;i++){
arr[$i]
}
next
}
FNR==1{
for(i=1;i<=NF;i++){
if($i in arr){
valArr[i]
header=(header?header OFS:"")$i
}
}
print header
next
}
{
val=""
for(i=1;i<=NF;i++){
if(i in valArr){
val=(val?val OFS:"")$i
}
}
print val
}
' another_file.txt FS="," OFS="," test.txt
Output will be as follows:
a,b,c
1,2,3
6,7,8
Explanation: Adding detailed explanation for above solution.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition which will be TRUE while reading another_text file here.
for(i=1;i<=NF;i++){ ##Traversing through all fields of current line.
arr[$i] ##Creating arr with index of current field value.
}
next ##next will skip all statements from here.
}
FNR==1{ ##Checking if this is 1st line for test.txt file.
for(i=1;i<=NF;i++){ ##Traversing through all fields of current line.
if($i in arr){ ##If current field values comes in arr then do following.
valArr[i] ##Creating valArr which has index of current field number.
header=(header?header OFS:"")$i ##Creating header which has each field value in it.
}
}
print header ##Printing header here.
next ##next will skip all statements from here.
}
{
val="" ##Nullifying val here.
for(i=1;i<=NF;i++){ ##Traversing through all fields of current line.
if(i in valArr){ ##Checking if i is present in valArr then do following.
val=(val?val OFS:"")$i ##Creating val which has current field value.
}
}
print val ##printing val here.
}
' another_file.txt FS="," OFS="," test.txt ##Mentioning Input_file names here.
Upvotes: 6
Reputation: 785098
Here is how you can do this in a single pass awk command:
arr=(a c e)
awk -v cols="${arr[*]}" 'BEGIN {FS=OFS=","; n=split(cols, tmp, / /); for (i=1; i<=n; ++i) hdr[tmp[i]]} NR==1 {for (i=1; i<=NF; ++i) if ($i in hdr) hnum[i]} {for (i=1; i<=NF; ++i) if (i in hnum) {printf "%s%s", (f ? OFS : ""), $i; f=1} f=0; print ""}' file
a,c,e
1,3,5
6,8,10
A more readable form:
awk -v cols="${arr[*]}" '
BEGIN {
FS = OFS = ","
n = split(cols, tmp, / /)
for (i=1; i<=n; ++i)
hdr[tmp[i]]
}
NR == 1 {
for (i=1; i<=NF; ++i)
if ($i in hdr)
hnum[i]
}
{
for (i=1; i<=NF; ++i)
if (i in hnum) {
printf "%s%s", (f ? OFS : ""), $i
f = 1
}
f = 0
print ""
}' file
Upvotes: 3