crysis405
crysis405

Reputation: 1131

Cut column by column name in bash

I want to specify a column by name (i.e. 102), find the position of this column and then use something like cut -5,7- with the found position to delete the specified column.

This is my file header (delim = "\t"):

#CHROM  POS 1   100 101 102 103 107 108

Upvotes: 0

Views: 8594

Answers (5)

toyeca
toyeca

Reputation: 169

Try this small awk utility to cut specific headers - https://github.com/rohitprajapati/toyeca-cutter

Example usage -

awk -f toyeca-cutter.awk -v c="col1, col2, col3, col4" my_file.csv

Upvotes: 0

rici
rici

Reputation: 241731

Here's one possible solution without the restriction that only one column is to be removed. It is written as a bash function, where the first argument is the filename, and the remaining arguments are the columns to exclude.

rmcol() {
  local file=$1
  shift
  cut -f$(head -n1 "$file" | tr \\t \\n | grep -vFxn "${@/#/-e}" |
          cut -d: -f1 | paste -sd,) "$file"
}

If you want to select rather than exclude the named columns, then change -vFxn to -Fxn.

That almost certainly requires some sort of explanation. The first two lines of the function just removes the filename from the arguments and stores it for later use. The cut command will then select the appropriate columns; the column numbers are computed with the complicated pipeline which follows:

head -n1 "$file" |  # Take the first line of the file
tr \\t \\n       |  # Change all the tabs to newlines [ Note 1]
grep                # Select all lines (i.e. column names) which
     -v             #   don't match
       F            #   the literal string
        x           #   which is the complete line
         n          #   and include the line number in the output
     "${@/#/-e}" |  # Put -e at the beginning of each command line argument,
                    #   converting the arguments into grep pattern arguments (-e)
cut -d: -f1      |  # Select only the line number from that matches
paste -sd,          # Paste together all the line numbers, separated with commas.

Upvotes: 2

Deleted User
Deleted User

Reputation: 2541

trying a solution without looping through columns, I get:

#!/bin/bash
pick="$1"
titles="pos 1 100 102 105"

tmp=" $titles "
tmp="${tmp%% $pick* }"
tmp=($tmp)

echo "column ${#tmp[@]}"

It suffers from incorrectly reporting last column if column name can't be found.

Upvotes: 0

Tudor Berariu
Tudor Berariu

Reputation: 4910

Using a for loop in bash:

C=1; for i in $(head file -n 1) ; do if [ $i == "102" ] ; then break ; else C=$(( $C + 1 )) ; fi ; done ; echo $C

And a full script

C=1
for i in $(head in_file -n 1) ; do
    echo $i
    if [ $i == "102" ] ; then
        break ;
    else
        echo $C
        C=$(( $C + 1 ))
    fi
done
cut -f1-$(($C-1)),$(($C+1))- in_file

Upvotes: 0

anubhava
anubhava

Reputation: 785156

This awk should work:

awk -F'\t' -v c="102" 'NR==1{for (i=1; i<=NF; i++) if ($i==c){p=i; break}; next} {print $p}' file

Upvotes: 2

Related Questions