Reputation: 932
I currently have a Pandas DataFrame looking like this:
DATESTAMP price name pct_chg
0 2006-01-02 62.987301 a 0.000000
1 2006-01-03 61.990700 a -0.015822
2 2006-01-04 62.987301 a 0.016077
3 2006-01-05 62.987301 a 0.000000
4 2006-01-06 61.990700 a -0.015822
6 2006-01-04 100.1 b 0.000000
7 2006-01-05 100.5 b -0.015822
8 2006-01-06 100.7 b 0.016077
9 2006-01-07 100.8 b 0.016090
The problem is that different items (specified with unique column name
) have different time of origination as well as being alive for different amount of time
a
starts at 2016-01-02
and ends at 2016-01-06
b
starts at 2006-01-04
and ends at 2006-01-07
.I would like to summarize the column pct_chg
in a new DataFrame, having DATESTAMP
as index and columns as of name
. I would also like the new DataFrame to have the index in such a manner that it starts with the "oldest" existing date-record (in this case 2006-01-02
) and ends at the "newest" (in this case 2006-01-07
).
The result would look like
a b
2006-01-02 0.000000 NaN
2006-01-03 -0.015822 NaN
2006-01-04 0.016077 0.000000
2006-01-05 0.000000 -0.015822
2006-01-06 -0.015822 0.016077
2006-01-07 NaN 0.016090
Upvotes: 2
Views: 45
Reputation: 863166
You can use set_index
with unstack
:
print (df.set_index(['DATESTAMP','name'])['pct_chg'].unstack())
name a b
DATESTAMP
2006-01-02 0.000000 NaN
2006-01-03 -0.015822 NaN
2006-01-04 0.016077 0.000000
2006-01-05 0.000000 -0.015822
2006-01-06 -0.015822 0.016077
2006-01-07 NaN 0.016090
Another solution with pivot
:
print (df.pivot(index='DATESTAMP', columns='name', values='pct_chg'))
name a b
DATESTAMP
2006-01-02 0.000000 NaN
2006-01-03 -0.015822 NaN
2006-01-04 0.016077 0.000000
2006-01-05 0.000000 -0.015822
2006-01-06 -0.015822 0.016077
2006-01-07 NaN 0.016090
Upvotes: 2