Reputation: 322
Here is my dataframe:
ID AMT DATE
0 1496846 54.76 2015-02-11
1 1496846 195.00 2015-01-09
2 1571558 11350.00 2015-04-30
3 1498812 135.00 2014-07-11
4 1498812 157.00 2014-08-04
5 1498812 110.00 2014-09-23
6 1498812 1428.00 2015-01-28
7 1558450 4355.00 2015-01-26
8 1858606 321.52 2015-03-27
9 1849431 1046.81 2015-03-19
I would like to make this a dataframe consisting of time series data for each ID
. That is, each column name is a date (sorted), and it is indexed by ID
, and the values are the AMT
values corresponding to each date. I can get so far as doing something like
df.set_index("DATE").T
but from here I'm stuck.
I also tried
df.pivot(index='ID', columns='DATE', values='AMT')
but this gave me an error on having duplicate entries (the ID
s).
I envision it as transposing DATE
, and then grouping by unique ID
and melting AMT
underneath.
Upvotes: 1
Views: 119
Reputation: 294506
you want to use pivot_table
where there is an aggfunc
parameter that handles duplicate indices.
df.pivot_table('AMT', 'DATE', 'ID', aggfunc='sum')
You'll want to choose how to handle the dups. I put 'sum'
in there. It defaults to 'mean'
Upvotes: 1