Reputation: 23
I have one large data frame that is generated from multiple excel files. I would like to slice the data frame by row and generate separate data frames based on a conditional of the 'Sample Name' column. My data frame to slice would look like:
Well Position Sample Name Target Name CT
0 A1 human 52928.0 40
1 A2 mouse 52928.0 32
2 A3 rat 52928.0 40
3 A4 human 52928.0 40
4 A5 human 52928.0 35
The source excel file may or may not contain data for all three species. For example they may all human samples, all mouse samples, or all rat samples.
My desired result would be:
human_df
Well Position Sample Name Target Name CT
0 A1 human 52928.0 40
1 A4 human 52928.0 40
2 A5 human 52928.0 35
rat_df
Well Position Sample Name Target Name CT
0 A3 rat 52928.0 40
mouse_df
Well Position Sample Name Target Name CT
0 A2 mouse 52928.0 32
My attempt to perform this function is:
for i,row in data.iterrows():
if row['Sample Name'] in data.iterrows() == 'mouse' or 'Mouse' or 'MOUSE':
species = 'mouse'
#make a new df_mouse
df_mouse = data[(data['Sample Name'] == species)]
if row['Sample Name'] in data.iterrows() == 'human' or 'Human' or 'HUMAN':
species = 'human'
df_human = data[(data['Sample Name'] == species)]
print("Human Dataframe = ", df_human)
if row['Sample Name'] in data.iterrows() == 'rat' or 'Rat' or 'RAT':
species = 'rat'
df_rat = data[(data['Sample Name'] == species)]
print("Rat Dataframe = ", df_rat)
This has worked to some extent, but fails when one of three species is not in the original excel file. Thanks in advance for your help.
Upvotes: 1
Views: 332
Reputation: 71687
Use DataFrame.groupby
on column Sample Name
and use dict comprehension to store each grouped df in the dictionary dct
, to refer the stored dataframe just use dct['name_of_df']
:
dct = {f'{k}_df': g.reset_index(drop=True) for k, g in df.groupby('Sample Name')}
Result:
# dct['human_df']
Well Position Sample Name Target Name CT
0 A1 human 52928.0 40
1 A4 human 52928.0 40
2 A5 human 52928.0 35
# dct['rat_df']
Well Position Sample Name Target Name CT
0 A3 rat 52928.0 40
# dct['mouse_df']
Well Position Sample Name Target Name CT
0 A2 mouse 52928.0 32
Upvotes: 1