user1879573
user1879573

Reputation: 251

How do I create/ sort a Table containing a list of matched terms with their corresponding counts

I am having problems trying to create a table containing a master list of names that have been matched and counted in two separate groups.

The Input_list.txt contains a master list of names and looks like this:

-5S_rRNA
-7SK
-ABCA8
-AC002480.4
-AC002978.1
-RP11-129B22.2

These names have been grep'd and counted in two separate data groups; group1_data.txt and group2_data.txt and look like this:

group1_data.txt

-5S_rRNA    20
-7SK    25
-AC002480.4 1
-AC002978.1 2

group2_data.txt

-5S_rRNA    1
-ABCA8  1 

I would like to create a table that contains the master Input_list.txt and the 2 data.txt files with the matched names and corresponding counts. If there isn't a match, I would like to return a value of 0 and to look like this:

Input   group1  group2
5S_rRNA 20  1
7SK 25  0
ABCA8   0   1
AC002480.4  1   0
AC002978.1  2   0

The number of matched names are not equal between the Input_list.txt and two data.txt files.

I've tried sort but I'm really stuck. Any suggestions would be great!

Upvotes: 0

Views: 20

Answers (1)

perreal
perreal

Reputation: 98028

Using join:

join -e 0 -a 1 -o '1.1 2.2' Input_list.txt group1_data.txt | \
   join -a 1 -e 0 -o '1.1 1.2 2.2' - group2_data.txt | \
   sed '/ 0 0$/d'

Prints:

-5S_rRNA 20 1
-7SK 25 0
-ABCA8 0 1
-AC002480.4 1 0
-AC002978.1 2 0

Upvotes: 1

Related Questions