Reputation: 645
I have a dataframe, where only "peak_time" is a column:
stimulus position peak_time
1 1 1.0
2 1.5
2 1 2.0
2 2.0
3 1 2.5
Now I'm trying to squeeze the position column and get lists instead, so that it looks like this:
stimulus peak_time
1 [1.0, 1.5]
2 [2.0, 2.0]
3 [2.5]
It's probably super easy, but I can't find any solution using goole. If someone opened this topic already, I'd appreciate the respective link, too. Thanks for your help!
Code to create dataframe:
import random, scipy
import pandas as pd
trial = [1,1,2,1,1,2,2,1,2]
stimulus = [1,1,1,2,2,2,2,3,3]
position = [1,2,1,1,2,1,2,1,1]
peak_time = random.sample(range(1000), 9)
df = pd.DataFrame({"trial": trial, "stimulus": stimulus, "position": position, "peak_time": peak_time})
median_ = df.groupby(['stimulus', 'position']).apply(scipy.nanmedian)
median_ = pd.DataFrame(median_)
median_.columns = ['peak_time']
median_
EDIT
As I can only post one question every 90 minutes, I'd like to ask a follow up question underneath this post. So now I have two panda series looking like this:
median_:
stimulus
1 [1.0, 1.5]
2 [2.0, 2.0]
3 [2.0]
quartile_:
stimulus
1 [[1.0, 70.0], [1.0, 183.25]]
2 [[1.0, 65.75], [2.0, 98.75]]
3 [[1.0, 51.25]]
I want to substract median_
from quartile_
so that I get
distance_:
stimulus
1 [1-1, 70-1], [1.5-1, 183.25-1.5]
2 [2-1, 65.75-1], [2-2, 98.75-2]
3 [2-1, 51.25-2]
Is there a simple way to do that? abs(median_ - quartile_)
does not work.
Code to create series:
import random, scipy
import pandas as pd
trial = [1,1,2,1,1,2,2,1,2]
stimulus = [1,1,1,2,2,2,2,3,3]
position = [1,2,1,1,2,1,2,1,1]
peak_time = random.sample(range(1000), 9)
df = pd.DataFrame({"trial": trial, "stimulus": stimulus, "position": position, "peak_time": peak_time})
median_ = df.groupby(['stimulus', 'position']).apply(scipy.nanmedian).groupby(level=0).apply(list)
quartile_ = df.groupby(['stimulus', 'position']).apply(lambda x: scipy.nanpercentile(x, [25, 75])).groupby(level=0).apply(list)
Solution
Apply groupby(level=0).apply(list)
later, so
median_ = df.groupby(['stimulus', 'position']).apply(scipy.nanmedian)
quartile_ = df.groupby(['stimulus', 'position']).apply(lambda x: scipy.nanpercentile(x, [25, 75]))
Then I can substract them easily
distance_ = abs(median_ - quartile_)
distance_ = distance.groupby(level=0).apply(list)
distance_
stimulus
1 [1-1, 70-1], [1.5-1, 183.25-1.5]
2 [2-1, 65.75-1], [2-2, 98.75-2]
3 [2-1, 51.25-2]
Upvotes: 2
Views: 102
Reputation: 862601
It is MultiIndex Series
, so need Series.groupby
with apply
list
:
#added column peak_time
median_ = df.groupby(['stimulus', 'position'])['peak_time'].apply(scipy.nanmedian)
df = median_.groupby(level=0).apply(list).reset_index()
print (df)
stimulus peak_time
0 1 [1.0, 1.5]
1 2 [2.0, 2.0]
2 3 [2.5]
Upvotes: 3