Create or modify DataFrame using another DataFrame

Question

I currently have a Pandas DataFrame looking like this:

   DATESTAMP   price                name   pct_chg
0  2006-01-02  62.987301            a      0.000000
1  2006-01-03  61.990700            a     -0.015822
2  2006-01-04  62.987301            a      0.016077
3  2006-01-05  62.987301            a      0.000000
4  2006-01-06  61.990700            a     -0.015822
6  2006-01-04  100.1                b      0.000000
7  2006-01-05  100.5                b     -0.015822
8  2006-01-06  100.7                b      0.016077
9  2006-01-07  100.8                b      0.016090

The problem is that different items (specified with unique column name) have different time of origination as well as being alive for different amount of time

Above item a starts at 2016-01-02 and ends at 2016-01-06
Above item b starts at 2006-01-04 and ends at 2006-01-07.

I would like to summarize the column pct_chg in a new DataFrame, having DATESTAMP as index and columns as of name. I would also like the new DataFrame to have the index in such a manner that it starts with the "oldest" existing date-record (in this case 2006-01-02) and ends at the "newest" (in this case 2006-01-07).

The result would look like

            a          b
2006-01-02  0.000000   NaN
2006-01-03  -0.015822  NaN
2006-01-04  0.016077   0.000000
2006-01-05  0.000000   -0.015822
2006-01-06  -0.015822  0.016077
2006-01-07  NaN        0.016090

jezrael · Accepted Answer

You can use set_index with unstack:

print (df.set_index(['DATESTAMP','name'])['pct_chg'].unstack())
name               a         b
DATESTAMP                     
2006-01-02  0.000000       NaN
2006-01-03 -0.015822       NaN
2006-01-04  0.016077  0.000000
2006-01-05  0.000000 -0.015822
2006-01-06 -0.015822  0.016077
2006-01-07       NaN  0.016090

Another solution with pivot:

print (df.pivot(index='DATESTAMP', columns='name', values='pct_chg'))
name               a         b
DATESTAMP                     
2006-01-02  0.000000       NaN
2006-01-03 -0.015822       NaN
2006-01-04  0.016077  0.000000
2006-01-05  0.000000 -0.015822
2006-01-06 -0.015822  0.016077
2006-01-07       NaN  0.016090

Create or modify DataFrame using another DataFrame

Answers (1)

Related Questions