Reputation: 53

Add a column depending on 2 df with python

I have 2 data frame such as :

TAXID

acc_number     taxi 
YP_001378452 2345
YP_001650052 5678
YP_009446812 5435
YP_002192894 7890

and

blast

Nothing  cluster         species     target          score
7101    cluster_000001  species1    YP_001378452.1  31.7    
50457   cluster_000001  species2    YP_001650052.1  27.9    
48798   cluster_000001  species3    YP_002192894.1  34.5    
8514    cluster_000001  species4    YP_009446812.1  28.9

and the idea is to add the column taxi in the df2 BUT as you can see the target are not quite similar because in df2 a .1 in added at the end.

I tried that to explain you better:

TAXID=pd.read_table("/pathtoTAXID.txt",header=0)
blast=pd.read_table("/pathtoblast.txt",header=0)


for i in blast["target"]:
    if i in TAXID["acc_number"] without .1:
        add TAXID[taxi] in the line of the blast

I also tried:

for i in blast["target"]:
    print(TAXID.loc[TAXID["Acc_number"] == i.split('.')[0]][1])

but I'm stuck here to only keep the taxi number Thanks for your help.

Upvotes: 2

Answers (2)

anky

Reputation: 75080

Use s.str.split() with s.map() making a dictionary with the help of dict(zip()):

blast['taxi']=blast.target.str.split(".").str[0].map(dict(zip(TAXID.acc_number,TAXID.taxi)))
print(df2)

   Nothing         cluster   species          target  score  taxi
0     7101  cluster_000001  species1  YP_001378452.1   31.7  2345
1    50457  cluster_000001  species2  YP_001650052.1   27.9  5678
2    48798  cluster_000001  species3  YP_002192894.1   34.5  7890
3     8514  cluster_000001  species4  YP_009446812.1   28.9  5435

Upvotes: 2

BENY

Reputation: 323226

Magic of replace :-) only work all target in blast have the mapping in TAXID

blast['New']=blast.target.replace(dict(zip(TAXID['acc_number'],TAXID['taxi'])),regex=True)
blast
Out[533]: 
   Nothing         cluster   species          target  score   New
0     7101  cluster_000001  species1  YP_001378452.1   31.7  2345
1    50457  cluster_000001  species2  YP_001650052.1   27.9  5678
2    48798  cluster_000001  species3  YP_002192894.1   34.5  7890
3     8514  cluster_000001  species4  YP_009446812.1   28.9  5435

Upvotes: 2

Add a column depending on 2 df with python

Answers (2)

Related Questions