Reputation: 719
I have a DataFrame analogous to this one:
import pandas
dd = pandas.DataFrame({'name' : ['foo', 'foo', 'foo', 'bar',
'bar', 'bar', 'bar', 'bar'],
'year' : ['1900', '1903', '1904', '1900',
'1901', '1902', '1903', '1904'],
'value' : np.arange(8)
})
Further along the pipeline I will need to compare foo
and bar
in terms of magnitudes derived from value
. This is why I would like to add rows for the missing years in foo
and fill them with NaN
.
So the final dd
should have additional rows and look like this:
value name year
0 0.0 foo 1900
1 NaN foo 1901
2 NaN foo 1902
3 0.1 foo 1903
4 0.2 foo 1904
5 0.3 bar 1900
6 0.4 bar 1901
7 0.5 bar 1902
8 0.6 bar 1903
9 0.7 bar 1904
I tried using this solution but it doesn't work in this case because I have duplicate values in the year
column.
I realize I have to add rows grouping by 'name'
but I couldn't see how.
What should I do?
Upvotes: 1
Views: 73
Reputation: 323226
IIUC
dd.set_index(['name','year']).value.unstack().stack(dropna=False).reset_index()
Out[983]:
name year 0
0 bar 1900 3.0
1 bar 1901 4.0
2 bar 1902 5.0
3 bar 1903 6.0
4 bar 1904 7.0
5 foo 1900 0.0
6 foo 1901 NaN
7 foo 1902 NaN
8 foo 1903 1.0
9 foo 1904 2.0
Upvotes: 1