CiaranWelsh
CiaranWelsh

Reputation: 7699

Reorganizing a Pandas DataFrame

I have some data in a python pandas dataframe that looks like this:

         Sample    Signal
225   TGBb_0m-2  1.943295
226   TGBb_5m-2  4.659431
227  TGBb_15m-2  1.713407
228  TGBb_30m-2  2.524867
229  TGBb_45m-2  2.776531
230  TGBb_90m-2  2.196248
231   TGBb_0m-1  2.329916
232   TGBb_5m-1  1.916303
233  TGBb_15m-1  3.892828
234  TGBb_30m-1  2.380105
235  TGBb_45m-1  2.667500
236  TGBb_90m-1  2.377786
237   TGBb_0m-3  1.836953
238  TGBb_15m-3  2.208754
239  TGBb_30m-3  1.561843
240  TGBb_45m-3  2.613384
241  TGBb_90m-3  2.081838

Here I have three replicate experiments, each with 6 time points except for replicate 3 which only has 5. I'm looking to reorder this dataframe such that its grouped by time point, rather than experiment. I think the best way to do this would be to split the big dataframe into smaller dataframes consisting of all data for a single time point. Does anybody know how I can achieve this?

For example, the desired output might look like:

         Sample    Signal
225   TGBb_0m-2  1.943295
231   TGBb_0m-1  2.329916
237   TGBb_0m-3  1.836953


         Sample    Signal
226   TGBb_5m-2  4.659431
232   TGBb_5m-1  1.916303    #missing third data point


227  TGBb_15m-2  1.713407
227  TGBb_15m-2  1.713407
238  TGBb_15m-3  2.208754

Upvotes: 0

Views: 269

Answers (2)

Jarad
Jarad

Reputation: 18983

Your data (for reproducible purposes):

df = pd.DataFrame({'Sample': {225: 'TGBb_0m-2',
            226: 'TGBb_5m-2',
            227: 'TGBb_15m-2',
            228: 'TGBb_30m-2',
            229: 'TGBb_45m-2',
            230: 'TGBb_90m-2',
            231: 'TGBb_0m-1',
            232: 'TGBb_5m-1',
            233: 'TGBb_15m-1',
            234: 'TGBb_30m-1',
            235: 'TGBb_45m-1',
            236: 'TGBb_90m-1',
            237: 'TGBb_0m-3',
            238: 'TGBb_15m-3',
            239: 'TGBb_30m-3',
            240: 'TGBb_45m-3',
            241: 'TGBb_90m-3'},
 'Signal': {225: 1.943295,
            226: 4.659431,
            227: 1.713407,
            228: 2.524867,
            229: 2.7765310000000003,
            230: 2.1962479999999998,
            231: 2.329916,
            232: 1.9163029999999999,
            233: 3.8928279999999997,
            234: 2.380105,
            235: 2.6675,
            236: 2.377786,
            237: 1.8369529999999998,
            238: 2.208754,
            239: 1.561843,
            240: 2.613384,
            241: 2.0818380000000003}},
          columns=['Sample', 'Signal'])

Since you need to group by part of the text in the Sample column, I might use str.extract like this:

df[['Time', 'Experiment']] = df['Sample'].str.extract(r'(.+)-(\d+)')

If you want to perform operations on a dataframe where the 'time points' are the same, I'd use a for loop to filter your initial df where the time point is equivalent to one of the unique time periods.

for time_period in df['Time'].unique():
  df_group = df[df['Time'] == time_period][['Sample', 'Signal']]
  print(df_group)

Produces the following result:

        Sample    Signal
225  TGBb_0m-2  1.943295
231  TGBb_0m-1  2.329916
237  TGBb_0m-3  1.836953
        Sample    Signal
226  TGBb_5m-2  4.659431
232  TGBb_5m-1  1.916303
         Sample    Signal
227  TGBb_15m-2  1.713407
233  TGBb_15m-1  3.892828
238  TGBb_15m-3  2.208754
         Sample    Signal
228  TGBb_30m-2  2.524867
234  TGBb_30m-1  2.380105
239  TGBb_30m-3  1.561843
         Sample    Signal
229  TGBb_45m-2  2.776531
235  TGBb_45m-1  2.667500
240  TGBb_45m-3  2.613384
         Sample    Signal
230  TGBb_90m-2  2.196248
236  TGBb_90m-1  2.377786
241  TGBb_90m-3  2.081838

If your goal is to simply reorder the dataframe by the time and not by the experiment, after using str.extract above, you just need to use df.sort_values('Time')[['Sample', 'Signal']] to get the following result:

         Sample    Signal
231   TGBb_0m-1  2.329916
237   TGBb_0m-3  1.836953
225   TGBb_0m-2  1.943295
233  TGBb_15m-1  3.892828
227  TGBb_15m-2  1.713407
238  TGBb_15m-3  2.208754
228  TGBb_30m-2  2.524867
234  TGBb_30m-1  2.380105
239  TGBb_30m-3  1.561843
229  TGBb_45m-2  2.776531
235  TGBb_45m-1  2.667500
240  TGBb_45m-3  2.613384
226   TGBb_5m-2  4.659431
232   TGBb_5m-1  1.916303
236  TGBb_90m-1  2.377786
230  TGBb_90m-2  2.196248
241  TGBb_90m-3  2.081838

Upvotes: 1

ysearka
ysearka

Reputation: 3855

I think you can achieve this goal with the dataframe.group_by command. Nevertheless you might have to slightly change your columns to split the time point and number of the experiment (for instance 'TGBb_0m-2' has to be split in 'TGBb_0m' in column 'Sample' and '2' in column 'your_new_column')

df = df.groupby(['Sample'],as_index = False)

Note that this way you create a specific kind of dataframe which is a groupby dataframe. So if you want to print it, just typing it's name won't work, you haveto use:

df.head()

Upvotes: 1

Related Questions