Federico Barabas
Federico Barabas

Reputation: 719

how to add NaN groupwise for missing values

I have a DataFrame analogous to this one:

import pandas

dd = pandas.DataFrame({'name' : ['foo', 'foo', 'foo', 'bar',
                                 'bar', 'bar', 'bar', 'bar'],
                       'year' : ['1900', '1903', '1904', '1900',
                                 '1901', '1902', '1903', '1904'],
                       'value' : np.arange(8)
                       })

Further along the pipeline I will need to compare foo and bar in terms of magnitudes derived from value. This is why I would like to add rows for the missing years in foo and fill them with NaN.

So the final dd should have additional rows and look like this:

   value name  year
0    0.0  foo  1900
1    NaN  foo  1901
2    NaN  foo  1902
3    0.1  foo  1903
4    0.2  foo  1904
5    0.3  bar  1900
6    0.4  bar  1901
7    0.5  bar  1902
8    0.6  bar  1903
9    0.7  bar  1904

I tried using this solution but it doesn't work in this case because I have duplicate values in the year column.

I realize I have to add rows grouping by 'name' but I couldn't see how.

What should I do?

Upvotes: 1

Views: 73

Answers (1)

BENY
BENY

Reputation: 323226

IIUC

dd.set_index(['name','year']).value.unstack().stack(dropna=False).reset_index()
Out[983]: 
  name  year    0
0  bar  1900  3.0
1  bar  1901  4.0
2  bar  1902  5.0
3  bar  1903  6.0
4  bar  1904  7.0
5  foo  1900  0.0
6  foo  1901  NaN
7  foo  1902  NaN
8  foo  1903  1.0
9  foo  1904  2.0

Upvotes: 1

Related Questions