Reputation: 7699
I have some data in a python pandas dataframe that looks like this:
Sample Signal
225 TGBb_0m-2 1.943295
226 TGBb_5m-2 4.659431
227 TGBb_15m-2 1.713407
228 TGBb_30m-2 2.524867
229 TGBb_45m-2 2.776531
230 TGBb_90m-2 2.196248
231 TGBb_0m-1 2.329916
232 TGBb_5m-1 1.916303
233 TGBb_15m-1 3.892828
234 TGBb_30m-1 2.380105
235 TGBb_45m-1 2.667500
236 TGBb_90m-1 2.377786
237 TGBb_0m-3 1.836953
238 TGBb_15m-3 2.208754
239 TGBb_30m-3 1.561843
240 TGBb_45m-3 2.613384
241 TGBb_90m-3 2.081838
Here I have three replicate experiments, each with 6 time points except for replicate 3 which only has 5. I'm looking to reorder this dataframe such that its grouped by time point, rather than experiment. I think the best way to do this would be to split the big dataframe into smaller dataframes consisting of all data for a single time point. Does anybody know how I can achieve this?
For example, the desired output might look like:
Sample Signal
225 TGBb_0m-2 1.943295
231 TGBb_0m-1 2.329916
237 TGBb_0m-3 1.836953
Sample Signal
226 TGBb_5m-2 4.659431
232 TGBb_5m-1 1.916303 #missing third data point
227 TGBb_15m-2 1.713407
227 TGBb_15m-2 1.713407
238 TGBb_15m-3 2.208754
Upvotes: 0
Views: 269
Reputation: 18983
Your data (for reproducible purposes):
df = pd.DataFrame({'Sample': {225: 'TGBb_0m-2',
226: 'TGBb_5m-2',
227: 'TGBb_15m-2',
228: 'TGBb_30m-2',
229: 'TGBb_45m-2',
230: 'TGBb_90m-2',
231: 'TGBb_0m-1',
232: 'TGBb_5m-1',
233: 'TGBb_15m-1',
234: 'TGBb_30m-1',
235: 'TGBb_45m-1',
236: 'TGBb_90m-1',
237: 'TGBb_0m-3',
238: 'TGBb_15m-3',
239: 'TGBb_30m-3',
240: 'TGBb_45m-3',
241: 'TGBb_90m-3'},
'Signal': {225: 1.943295,
226: 4.659431,
227: 1.713407,
228: 2.524867,
229: 2.7765310000000003,
230: 2.1962479999999998,
231: 2.329916,
232: 1.9163029999999999,
233: 3.8928279999999997,
234: 2.380105,
235: 2.6675,
236: 2.377786,
237: 1.8369529999999998,
238: 2.208754,
239: 1.561843,
240: 2.613384,
241: 2.0818380000000003}},
columns=['Sample', 'Signal'])
Since you need to group by part of the text in the Sample
column, I might use str.extract
like this:
df[['Time', 'Experiment']] = df['Sample'].str.extract(r'(.+)-(\d+)')
If you want to perform operations on a dataframe where the 'time points' are the same, I'd use a for
loop to filter your initial df
where the time point is equivalent to one of the unique time periods.
for time_period in df['Time'].unique():
df_group = df[df['Time'] == time_period][['Sample', 'Signal']]
print(df_group)
Produces the following result:
Sample Signal
225 TGBb_0m-2 1.943295
231 TGBb_0m-1 2.329916
237 TGBb_0m-3 1.836953
Sample Signal
226 TGBb_5m-2 4.659431
232 TGBb_5m-1 1.916303
Sample Signal
227 TGBb_15m-2 1.713407
233 TGBb_15m-1 3.892828
238 TGBb_15m-3 2.208754
Sample Signal
228 TGBb_30m-2 2.524867
234 TGBb_30m-1 2.380105
239 TGBb_30m-3 1.561843
Sample Signal
229 TGBb_45m-2 2.776531
235 TGBb_45m-1 2.667500
240 TGBb_45m-3 2.613384
Sample Signal
230 TGBb_90m-2 2.196248
236 TGBb_90m-1 2.377786
241 TGBb_90m-3 2.081838
If your goal is to simply reorder the dataframe by the time and not by the experiment, after using str.extract
above, you just need to use df.sort_values('Time')[['Sample', 'Signal']]
to get the following result:
Sample Signal
231 TGBb_0m-1 2.329916
237 TGBb_0m-3 1.836953
225 TGBb_0m-2 1.943295
233 TGBb_15m-1 3.892828
227 TGBb_15m-2 1.713407
238 TGBb_15m-3 2.208754
228 TGBb_30m-2 2.524867
234 TGBb_30m-1 2.380105
239 TGBb_30m-3 1.561843
229 TGBb_45m-2 2.776531
235 TGBb_45m-1 2.667500
240 TGBb_45m-3 2.613384
226 TGBb_5m-2 4.659431
232 TGBb_5m-1 1.916303
236 TGBb_90m-1 2.377786
230 TGBb_90m-2 2.196248
241 TGBb_90m-3 2.081838
Upvotes: 1
Reputation: 3855
I think you can achieve this goal with the dataframe.group_by command. Nevertheless you might have to slightly change your columns to split the time point and number of the experiment (for instance 'TGBb_0m-2' has to be split in 'TGBb_0m' in column 'Sample' and '2' in column 'your_new_column')
df = df.groupby(['Sample'],as_index = False)
Note that this way you create a specific kind of dataframe which is a groupby dataframe. So if you want to print it, just typing it's name won't work, you haveto use:
df.head()
Upvotes: 1