Reputation: 651
I have a pandas Data Frame that I would like to fill in some NaN values of.
import pandas as pd
tuples = [('a', 1990),('a', 1994),('a',1996),('b',1992),('b',1997),('c',2001)]
index = pd.MultiIndex.from_tuples(tuples, names = ['Type', 'Year'])
vals = ['NaN','NaN','SomeName','NaN','SomeOtherName','SomeThirdName']
df = pd.DataFrame(vals, index=index)
print(df)
0
Type Year
a 1990 NaN
1994 NaN
1996 SomeName
b 1992 NaN
1997 SomeOtherName
c 2001 SomeThirdName
The output that I would like is:
Type Year
a 1990 SomeName
1994 SomeName
1996 SomeName
b 1992 SomeOtherName
1997 SomeOtherName
c 2001 SomeThirdName
This needs to be done on a much larger DataFrame (millions of rows) where each 'Type' can have between 1-5 unique 'Years' and the name value is only present for the most recent year. I'm trying to avoid iterating over rows for performance purposes.
Upvotes: 1
Views: 335
Reputation: 214977
You can sort your data frame by index in descending order and then ffill
it:
import pandas as pd
df.sort_index(level = [0,1], ascending = False).ffill()
# 0
# Type Year
# c 2001 SomeThirdName
# b 1997 SomeOtherName
# 1992 SomeOtherName
# a 1996 SomeName
# 1994 SomeName
# 1990 SomeName
Note: The example data doesn't really contain np.nan
values but string NaN
, so in order for ffill
to work you need to replace the NaN
string as np.nan
:
import numpy as np
df[0] = np.where(df[0] == "NaN", np.nan, df[0])
Or as @ayhan suggested, after replacing the String "NaN" with np.nan
use df.bfill()
.
Upvotes: 1