A. Palmer
A. Palmer

Reputation: 149

Bash split and sort by two columns in same file

For my file which looks like this:

AABBCC    10    5    CCAABB    100
BBCCAA    4     2    AABBCC    50
CCAABB    16    8    BBCCAA    20

... I am trying to sort columns 4 and 5, by matching column 4 to column 1.

Ideally this would return:

AABBCC    10    5    AABBCC    50
BBCCAA    4     2    BBCCAA    20
CCAABB    16    8    CCAABB    100

I have tried using sort, however as far as I'm aware it doesn't have a utility to sort within files.

Any help would be greatly appreciated!

Upvotes: 0

Views: 1322

Answers (3)

Amadan
Amadan

Reputation: 198436

join -t $'\t' -o 1.1,1.2,1.3,2.1,2.2 <(cut -f1-3 file.tsv | sort -k 1,1) <(cut -f4- file.tsv | sort -k 1,1) | sort

Cut the original file, then join on the first field of both. We need to specify the full join fields in -o to preserve the first column, or join will eat it.

Upvotes: 2

RomanPerekhrest
RomanPerekhrest

Reputation: 92874

awk solution:

awk 'NR==FNR{ a[$4]=$5; next }$1 in a{ print $1,$2,$3,$1,a[$1] }' file1 OFS="\t" file1

The output:

AABBCC  10      5       AABBCC  50
BBCCAA  4       2       BBCCAA  20
CCAABB  16      8       CCAABB  100

You may pipe to sort at the end: ... | sort

Upvotes: 3

Cyrus
Cyrus

Reputation: 88776

With bash and GNU paste:

With temporary files for illustration:

cut -f 1-3 file | sort > file_1to3
cut -f 4-5 file | sort > file_4to5
paste -d '\t' file_1to3 file_4to5

Without temporary files:

paste -d '\t' <(cut -f 1-3 file | sort) <(cut -f 4-5 file | sort)

Output:

AABBCC  10      5       AABBCC  50
BBCCAA  4       2       BBCCAA  20
CCAABB  16      8       CCAABB  100

Upvotes: 2

Related Questions