Istika
Istika

Reputation: 23

How to slice a pandas data frame based on a conditional if statement using df.iterrows and create new dataframe

I have one large data frame that is generated from multiple excel files. I would like to slice the data frame by row and generate separate data frames based on a conditional of the 'Sample Name' column. My data frame to slice would look like:

    Well Position Sample Name  Target Name  CT
0              A1       human      52928.0  40
1              A2       mouse      52928.0  32
2              A3         rat      52928.0  40
3              A4       human      52928.0  40
4              A5       human      52928.0  35

The source excel file may or may not contain data for all three species. For example they may all human samples, all mouse samples, or all rat samples.

My desired result would be:

human_df 
    Well Position Sample Name  Target Name  CT
0              A1       human      52928.0  40
1              A4       human      52928.0  40
2              A5       human      52928.0  35

rat_df
    Well Position Sample Name  Target Name  CT
0              A3         rat      52928.0  40

mouse_df
    Well Position Sample Name  Target Name  CT
0              A2       mouse      52928.0  32

My attempt to perform this function is:

for i,row in data.iterrows():
            if row['Sample Name'] in data.iterrows() == 'mouse' or 'Mouse' or 'MOUSE':
                species = 'mouse'
                #make a new df_mouse
                df_mouse = data[(data['Sample Name'] == species)] 

            if row['Sample Name'] in data.iterrows() == 'human' or 'Human' or 'HUMAN':
                species = 'human'
                df_human = data[(data['Sample Name'] == species)]
                print("Human Dataframe = ", df_human)

            if row['Sample Name'] in data.iterrows() == 'rat' or 'Rat' or 'RAT':
                species = 'rat'
                df_rat = data[(data['Sample Name'] == species)]
                print("Rat Dataframe = ", df_rat)

This has worked to some extent, but fails when one of three species is not in the original excel file. Thanks in advance for your help.

Upvotes: 1

Views: 332

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71687

Use DataFrame.groupby on column Sample Name and use dict comprehension to store each grouped df in the dictionary dct, to refer the stored dataframe just use dct['name_of_df']:

dct = {f'{k}_df': g.reset_index(drop=True) for k, g in df.groupby('Sample Name')}

Result:

# dct['human_df']
  Well Position Sample Name  Target Name  CT
0            A1       human      52928.0  40
1            A4       human      52928.0  40
2            A5       human      52928.0  35

# dct['rat_df']
    Well Position Sample Name  Target Name  CT
0              A3         rat      52928.0  40

# dct['mouse_df']
    Well Position Sample Name  Target Name  CT
0              A2       mouse      52928.0  32

Upvotes: 1

Related Questions