Reputation: 411
I have one dataframe (df1) with details of Genes and list of organs they are associated with and another mapping dataframe (df2) that mach these organs to unique organ type.
E.g.
df1 <-
data.frame ("Gene_name"=c("Gene1", "Gene2", "Gene3, "Gene4"),
"Organ_name"=c("Skin, Stomach, Eyes, Hair", "Lungs, Mouth, Oesophagus", "Pharynx, Lungs, Throat, Skin", "Stomach, Small intestine"))
df2 <-
data.frame ("Type"=c("External", "External", "External", "External"......"Internal", "Internal", "Internal"...),
"Organ"=c("Skin", "Eyes", "Hair", "Legs",.... "Lungs", "Small intestine", "Oesophagus".....))
I want to see what major category individual genes belong to. Is it frequently present internally or externally?
If I split "Organ_name"
using str.split(",")
then in some cases I am getting around 20 columns. Merging these individual "Organ_name" columns of df1 with "Type"
in df2 using Organ as the key is a big pain.
Is there a better way to analyse this data? How to know the frequency/count for "Type"
of organ? Please let me know
Upvotes: 0
Views: 48
Reputation: 164773
Below is an example of how you can structure your logic with pandas
.
Setup
import pandas as pd
df1 = pd.DataFrame({"Gene_name": ("Gene1", "Gene2", "Gene3", "Gene4"),
"Organ_name": ("Skin, Stomach, Eyes, Hair", "Lungs, Mouth, Oesophagus",
"Pharynx, Lungs, Throat, Skin", "Stomach, Small intestine")})
df2 = pd.DataFrame({"Type": ("External", "External", "External", "External", "Internal", "Internal", "Internal"),
"Organ": ("Skin", "Eyes", "Hair", "Legs", "Lungs", "Small intestine", "Oesophagus")})
Solution
t = df2.set_index('Organ')['Type']
df1['Organ_list'] = df1['Organ_name'].str.split(', ')
df1['Int_Ext'] = [list(filter(None, map(t.get, x))) for x in df1['Organ_list']]
df1['Int_Ext_Flag'] = df1['Int_Ext'].apply(lambda x: 'Internal' if \
x.count('Internal') / len(x) >= 0.5 else 'External')
Result
Gene_name Organ_name Organ_list \
0 Gene1 Skin, Stomach, Eyes, Hair [Skin, Stomach, Eyes, Hair]
1 Gene2 Lungs, Mouth, Oesophagus [Lungs, Mouth, Oesophagus]
2 Gene3 Pharynx, Lungs, Throat, Skin [Pharynx, Lungs, Throat, Skin]
3 Gene4 Stomach, Small intestine [Stomach, Small intestine]
Int_Ext Int_Ext_Flag
0 [External, External, External] External
1 [Internal, Internal] Internal
2 [Internal, External] Internal
3 [Internal] Internal
Explanation
df2
.df1['Organ_list']
to form a list.pd.Series.apply
.list(filter(None, ...))
.Upvotes: 1