Extracting file columns based on header feature

Question

I have a large csv matrix file (input.csv) in the following format:

Patient,sample 66_pos_LC/MS Pos_con,sample 57_net_LC/MS Neg_dis,sample 1_LC/MS Polar_con,sample 3_LC/MS Neg_net
xx,2.5,-7.8,2.5,3.6
ab 1,5.4,3,0.3
yy,43,33,77,55

depending on the first line of the file, I want to split file content based on LC/MS feature. That is, if I want all LC/MS Neg, output file would look like :

output1.csv

Patient,sample 57_net_LC/MS Neg_dis,sample 3_LC/MS Neg_net
xx,-7.8,3.6
ab,5.4,0.3
yy,33,55

For this I use:

head -n 1 input.csv | tr ',' '
' | cat -n | grep 'LC/MS Neg'

This gives the column numbers where I should look (in this case: 3 and 5); to get the above output, I use:

cut -d, -f1,3,5 input.csv > output1.csv

Although it gives me the desired output, but I find this would be complicated for extracting many columns at a time.

I will highly appreciate some awk/sed solution.

Thank you.

Ed Morton · Accepted Answer

$ cat tst.awk
BEGIN { FS=OFS="," }
NR==1 {
    f[++numFlds] = 1
    for (i=2; i<=NF; i++) {
        if ($i ~ sel) {
            f[++numFlds] = i
        }
    }
}
{
    for (i=1; i<=numFlds; i++) {
        printf "%s%s", $(f[i]), (i



.

$ awk -v sel=Neg -f tst.awk file
Patient,sample 57_net_LC/MS Neg_dis,sample 3_LC/MS Neg_net
xx,-7.8,3.6
ab 1,3,
yy,33,55

$ awk -v sel=Pos -f tst.awk file
Patient,sample 66_pos_LC/MS Pos_con
xx,2.5
ab 1,5.4
yy,43

$ awk -v sel=Polar -f tst.awk file
Patient,sample 1_LC/MS Polar_con
xx,2.5
ab 1,0.3
yy,77

$ awk -v sel='Pos|Neg' -f tst.awk file
Patient,sample 66_pos_LC/MS Pos_con,sample 57_net_LC/MS Neg_dis,sample 3_LC/MS Neg_net
xx,2.5,-7.8,3.6
ab 1,5.4,3,
yy,43,33,55

Extracting file columns based on header feature

Answers (2)

Related Questions