viraptor
viraptor

Reputation: 34155

Converting time series into a heatmap

I'm looking for a good transformation in pandas which would allow me to go from a time series of measurements into a count-per-bin-per-timebin list.

Let's say I've got my:

x = list(range(count))
y = [random.gauss(1, 0.1) for _ in range(count)]

I can transform this into binned intervals on both sides:

df = pandas.DataFrame.from_dict({'x': x, 'y': y})
df['x'].update(pandas.cut(df['x'], 20))
df['y'].update(pandas.cut(df['y'], 20))

I know I can get the value counts for the ys using:

df['y'].value_counts()

But I'm having trouble putting the "run value_counts on y grouped by unique x values, then unroll, and return that" into a valid operation.


Example:

y = [1, 1, 2, 3, 4, 4]
x = [0, 1, 2, 3, 4, 5]
bin_count = 2

expected:

df: x    y  count
    0-2  1  2
    0-2  2  1
    3-5  3  1
    3-5  4  2

Upvotes: 1

Views: 129

Answers (1)

jezrael
jezrael

Reputation: 862761

I believe you need SeriesGroupBy.value_counts with reset_index:

y = [1, 1, 2, 3, 4, 4]
x = [0, 1, 2, 3, 4, 5]
bin_count = 2
df = pd.DataFrame.from_dict({'x': x, 'y': y})
df['x'].update(pd.cut(df['x'], bin_count))

df1 = df.groupby('x')['y'].value_counts().reset_index(name='count')
print (df1)
               x  y  count
0  (-0.005, 2.5]  1      2
1  (-0.005, 2.5]  2      1
2     (2.5, 5.0]  4      2
3     (2.5, 5.0]  3      1

For columns from y use unstack:

df1 = df.groupby('x')['y'].value_counts().unstack(fill_value=0)
print (df1)
y              1  2  3  4
x                        
(-0.005, 2.5]  2  1  0  0
(2.5, 5.0]     0  0  1  2

EDIT:

If need unique values for bins add parameter labels=False to cut:

df['x'].update(pd.cut(df['x'], bin_count, labels=False))

df1 = df.groupby('x')['y'].value_counts().unstack(fill_value=0)
print (df1)
y  1  2  3  4
x            
0  2  1  0  0
1  0  0  1  2

Upvotes: 2

Related Questions