Svenno Nito
Svenno Nito

Reputation: 645

Python: create nested list from long format panda dataframes

I have a dataframe, where only "peak_time" is a column:

stimulus position peak_time 
1        1        1.0
         2        1.5
2        1        2.0
         2        2.0
3        1        2.5

Now I'm trying to squeeze the position column and get lists instead, so that it looks like this:

stimulus peak_time  
1        [1.0, 1.5]
2        [2.0, 2.0]
3        [2.5]

It's probably super easy, but I can't find any solution using goole. If someone opened this topic already, I'd appreciate the respective link, too. Thanks for your help!

Code to create dataframe:

import random, scipy
import pandas as pd
trial     = [1,1,2,1,1,2,2,1,2]
stimulus  = [1,1,1,2,2,2,2,3,3] 
position  = [1,2,1,1,2,1,2,1,1]
peak_time = random.sample(range(1000), 9)
df        = pd.DataFrame({"trial": trial, "stimulus": stimulus, "position": position, "peak_time": peak_time})
median_   = df.groupby(['stimulus', 'position']).apply(scipy.nanmedian)
median_   = pd.DataFrame(median_)
median_.columns = ['peak_time']
median_

EDIT

As I can only post one question every 90 minutes, I'd like to ask a follow up question underneath this post. So now I have two panda series looking like this:

median_:
stimulus
1    [1.0, 1.5]
2    [2.0, 2.0]
3    [2.0]

quartile_:
stimulus
1    [[1.0, 70.0],  [1.0, 183.25]]
2    [[1.0, 65.75], [2.0, 98.75]]
3    [[1.0, 51.25]]

I want to substract median_ from quartile_ so that I get

distance_: 
stimulus
1   [1-1, 70-1], [1.5-1, 183.25-1.5]
2   [2-1, 65.75-1], [2-2, 98.75-2]
3   [2-1, 51.25-2]

Is there a simple way to do that? abs(median_ - quartile_) does not work.

Code to create series:

import random, scipy
import pandas as pd
trial     = [1,1,2,1,1,2,2,1,2]
stimulus  = [1,1,1,2,2,2,2,3,3] 
position  = [1,2,1,1,2,1,2,1,1]
peak_time = random.sample(range(1000), 9)
df        = pd.DataFrame({"trial": trial, "stimulus": stimulus, "position": position, "peak_time": peak_time})
median_   = df.groupby(['stimulus', 'position']).apply(scipy.nanmedian).groupby(level=0).apply(list)
quartile_ = df.groupby(['stimulus', 'position']).apply(lambda x: scipy.nanpercentile(x, [25, 75])).groupby(level=0).apply(list)

Solution

Apply groupby(level=0).apply(list) later, so

median_   = df.groupby(['stimulus', 'position']).apply(scipy.nanmedian)
quartile_ = df.groupby(['stimulus', 'position']).apply(lambda x: scipy.nanpercentile(x, [25, 75]))

Then I can substract them easily

distance_ = abs(median_ - quartile_)
distance_ = distance.groupby(level=0).apply(list)
distance_

stimulus
1   [1-1, 70-1], [1.5-1, 183.25-1.5]
2   [2-1, 65.75-1], [2-2, 98.75-2]
3   [2-1, 51.25-2]

Upvotes: 2

Views: 102

Answers (1)

jezrael
jezrael

Reputation: 862601

It is MultiIndex Series, so need Series.groupby with apply list:

#added column peak_time
median_   = df.groupby(['stimulus', 'position'])['peak_time'].apply(scipy.nanmedian)
df        = median_.groupby(level=0).apply(list).reset_index()
print (df)
   stimulus   peak_time
0         1  [1.0, 1.5]
1         2  [2.0, 2.0]
2         3       [2.5]

Upvotes: 3

Related Questions