Reputation: 53
I have 2 data frame such as :
TAXID
acc_number taxi
YP_001378452 2345
YP_001650052 5678
YP_009446812 5435
YP_002192894 7890
and
blast
Nothing cluster species target score
7101 cluster_000001 species1 YP_001378452.1 31.7
50457 cluster_000001 species2 YP_001650052.1 27.9
48798 cluster_000001 species3 YP_002192894.1 34.5
8514 cluster_000001 species4 YP_009446812.1 28.9
and the idea is to add the column taxi in the df2 BUT as you can see the target are not quite similar because in df2 a .1 in added at the end.
I tried that to explain you better:
TAXID=pd.read_table("/pathtoTAXID.txt",header=0)
blast=pd.read_table("/pathtoblast.txt",header=0)
for i in blast["target"]:
if i in TAXID["acc_number"] without .1:
add TAXID[taxi] in the line of the blast
I also tried:
for i in blast["target"]:
print(TAXID.loc[TAXID["Acc_number"] == i.split('.')[0]][1])
but I'm stuck here to only keep the taxi number Thanks for your help.
Upvotes: 2
Views: 38
Reputation: 75080
Use s.str.split()
with s.map()
making a dictionary with the help of dict(zip())
:
blast['taxi']=blast.target.str.split(".").str[0].map(dict(zip(TAXID.acc_number,TAXID.taxi)))
print(df2)
Nothing cluster species target score taxi
0 7101 cluster_000001 species1 YP_001378452.1 31.7 2345
1 50457 cluster_000001 species2 YP_001650052.1 27.9 5678
2 48798 cluster_000001 species3 YP_002192894.1 34.5 7890
3 8514 cluster_000001 species4 YP_009446812.1 28.9 5435
Upvotes: 2
Reputation: 323226
Magic of replace
:-) only work all target
in blast
have the mapping in TAXID
blast['New']=blast.target.replace(dict(zip(TAXID['acc_number'],TAXID['taxi'])),regex=True)
blast
Out[533]:
Nothing cluster species target score New
0 7101 cluster_000001 species1 YP_001378452.1 31.7 2345
1 50457 cluster_000001 species2 YP_001650052.1 27.9 5678
2 48798 cluster_000001 species3 YP_002192894.1 34.5 7890
3 8514 cluster_000001 species4 YP_009446812.1 28.9 5435
Upvotes: 2