Reputation: 6223

pandas: unstacking DataFrame to Series causes NaN

I have the following dataframe and want to unstack it, yielding a series x and a series y with time as the index and value as the data:

   var  time  value
0    x     0     11
1    y     0    123
2    x     1     12
3    y     1    124
4    x     2     13
5    y     2    125

This is my code:

import pandas as pd

df = pd.DataFrame({
        'time': [0,0,1,1,2,2],
        'var': list('xyxyxy'),
        'value': [11,123,12,124,13,125]})

for col in ['x', 'y']:
    s = pd.Series(
            data=df.loc[df['var'] == col, 'value'],
            index=df.loc[df['var'] == col, 'time'],
            name=col)
    print(s)

This is the output:

time
0    11.0
1     NaN
2    12.0
Name: x, dtype: float64

time
0      NaN
1    123.0
2      NaN
Name: y, dtype: float64

But I expect the series to be

time
0    11.0
1    12.0 
2    13.0
Name: x, dtype: float64

time
0    123.0
1    124.0
2    125.0
Name: y, dtype: float64

Obviously pandas doesn't align the time-axis correctly to the value-axis. To my understanding each .loc should just return the corresponding 3 elements and assemble them together as index and data for the newly constructed series.

Why isn't this happening?
What would be the easiest way to get my desired result?

Upvotes: 1

Answers (3)

Vaishali

Reputation: 38415

There are better ways to achieve this as @Wen pointed out. Incase you are wondering whats wrong with your code, you are passing a series as data and index, pass an array instead.

for v in df['var'].unique():
    s = pd.Series(
        data=df.loc[df['var'] == v, 'value'].values,
        index=df.loc[df['var'] == v, 'time'].values,
        name=col)
    print(s)

0    11
1    12
2    13
Name: y, dtype: int64
0    123
1    124
2    125
Name: y, dtype: int64

Upvotes: 1

Joe Patten

Reputation: 1704

I am not sure why putting a Series as your data gives you that result, but I can answer your second question

What would be the easiest way to get my desired result?

You can convert your data to a list to get the desired result.

for col in ['x', 'y']:
    s = pd.Series(
            data=df.loc[df['var'] == col, 'value'].tolist(),
            index=df.loc[df['var'] == col, 'time'],
            name=col)
    print(s)

Output:

time
0    11
1    12
2    13
Name: x, dtype: int64
time
0    123
1    124
2    125
Name: y, dtype: int64

Here is another way similar to the answer submitted by @Wen (and more elegant):

df_new = pd.pivot_table(data=df, columns='var', index='time', values='value')

Output:

var    x    y
time         
0     11  123
1     12  124
2     13  125

Then you can call df_new['x'] or df_new['y'].

Upvotes: 0

BENY

Reputation: 323306

This is pivot problem

s=df.pivot(*df.columns)
s
Out[56]: 
time    0    1    2
var                
x      11   12   13
y     123  124  125

#s['y'],s['x']

Upvotes: 2

pandas: unstacking DataFrame to Series causes NaN

Answers (3)

Related Questions