ascripter
ascripter

Reputation: 6223

pandas: unstacking DataFrame to Series causes NaN

I have the following dataframe and want to unstack it, yielding a series x and a series y with time as the index and value as the data:

   var  time  value
0    x     0     11
1    y     0    123
2    x     1     12
3    y     1    124
4    x     2     13
5    y     2    125

This is my code:

import pandas as pd

df = pd.DataFrame({
        'time': [0,0,1,1,2,2],
        'var': list('xyxyxy'),
        'value': [11,123,12,124,13,125]})

for col in ['x', 'y']:
    s = pd.Series(
            data=df.loc[df['var'] == col, 'value'],
            index=df.loc[df['var'] == col, 'time'],
            name=col)
    print(s)

This is the output:

time
0    11.0
1     NaN
2    12.0
Name: x, dtype: float64

time
0      NaN
1    123.0
2      NaN
Name: y, dtype: float64

But I expect the series to be

time
0    11.0
1    12.0 
2    13.0
Name: x, dtype: float64

time
0    123.0
1    124.0
2    125.0
Name: y, dtype: float64

Obviously pandas doesn't align the time-axis correctly to the value-axis. To my understanding each .loc should just return the corresponding 3 elements and assemble them together as index and data for the newly constructed series.

  1. Why isn't this happening?
  2. What would be the easiest way to get my desired result?

Upvotes: 1

Views: 1176

Answers (3)

Vaishali
Vaishali

Reputation: 38415

There are better ways to achieve this as @Wen pointed out. Incase you are wondering whats wrong with your code, you are passing a series as data and index, pass an array instead.

for v in df['var'].unique():
    s = pd.Series(
        data=df.loc[df['var'] == v, 'value'].values,
        index=df.loc[df['var'] == v, 'time'].values,
        name=col)
    print(s)

0    11
1    12
2    13
Name: y, dtype: int64
0    123
1    124
2    125
Name: y, dtype: int64

Upvotes: 1

Joe Patten
Joe Patten

Reputation: 1704

I am not sure why putting a Series as your data gives you that result, but I can answer your second question

  1. What would be the easiest way to get my desired result?

You can convert your data to a list to get the desired result.

for col in ['x', 'y']:
    s = pd.Series(
            data=df.loc[df['var'] == col, 'value'].tolist(),
            index=df.loc[df['var'] == col, 'time'],
            name=col)
    print(s)

Output:

time
0    11
1    12
2    13
Name: x, dtype: int64
time
0    123
1    124
2    125
Name: y, dtype: int64

Here is another way similar to the answer submitted by @Wen (and more elegant):

df_new = pd.pivot_table(data=df, columns='var', index='time', values='value')

Output:

var    x    y
time         
0     11  123
1     12  124
2     13  125

Then you can call df_new['x'] or df_new['y'].

Upvotes: 0

BENY
BENY

Reputation: 323306

This is pivot problem

s=df.pivot(*df.columns)
s
Out[56]: 
time    0    1    2
var                
x      11   12   13
y     123  124  125

#s['y'],s['x']

Upvotes: 2

Related Questions