Reputation: 6223
I have the following dataframe and want to unstack it, yielding a series x
and a series y
with time
as the index and value
as the data:
var time value
0 x 0 11
1 y 0 123
2 x 1 12
3 y 1 124
4 x 2 13
5 y 2 125
This is my code:
import pandas as pd
df = pd.DataFrame({
'time': [0,0,1,1,2,2],
'var': list('xyxyxy'),
'value': [11,123,12,124,13,125]})
for col in ['x', 'y']:
s = pd.Series(
data=df.loc[df['var'] == col, 'value'],
index=df.loc[df['var'] == col, 'time'],
name=col)
print(s)
This is the output:
time
0 11.0
1 NaN
2 12.0
Name: x, dtype: float64
time
0 NaN
1 123.0
2 NaN
Name: y, dtype: float64
But I expect the series to be
time
0 11.0
1 12.0
2 13.0
Name: x, dtype: float64
time
0 123.0
1 124.0
2 125.0
Name: y, dtype: float64
Obviously pandas doesn't align the time
-axis correctly to the value
-axis. To my understanding each .loc
should just return the corresponding 3 elements and assemble them together as index and data for the newly constructed series.
Upvotes: 1
Views: 1176
Reputation: 38415
There are better ways to achieve this as @Wen pointed out. Incase you are wondering whats wrong with your code, you are passing a series as data and index, pass an array instead.
for v in df['var'].unique():
s = pd.Series(
data=df.loc[df['var'] == v, 'value'].values,
index=df.loc[df['var'] == v, 'time'].values,
name=col)
print(s)
0 11
1 12
2 13
Name: y, dtype: int64
0 123
1 124
2 125
Name: y, dtype: int64
Upvotes: 1
Reputation: 1704
I am not sure why putting a Series as your data gives you that result, but I can answer your second question
You can convert your data to a list to get the desired result.
for col in ['x', 'y']:
s = pd.Series(
data=df.loc[df['var'] == col, 'value'].tolist(),
index=df.loc[df['var'] == col, 'time'],
name=col)
print(s)
Output:
time
0 11
1 12
2 13
Name: x, dtype: int64
time
0 123
1 124
2 125
Name: y, dtype: int64
Here is another way similar to the answer submitted by @Wen (and more elegant):
df_new = pd.pivot_table(data=df, columns='var', index='time', values='value')
Output:
var x y
time
0 11 123
1 12 124
2 13 125
Then you can call df_new['x']
or df_new['y']
.
Upvotes: 0
Reputation: 323306
This is pivot
problem
s=df.pivot(*df.columns)
s
Out[56]:
time 0 1 2
var
x 11 12 13
y 123 124 125
#s['y'],s['x']
Upvotes: 2