AJG519
AJG519

Reputation: 3379

Pandas data frame fill null values with index

I have a dataframe where for one column I want to fill null values with the index value. What is the best way of doing this?

Say my dataframe looks like this:

>>> import numpy as np
>>> import pandas as pd
>>> d=pd.DataFrame(index=['A','B','C'], columns=['Num','Name'], data=[[1,'Andrew'], [2, np.nan], [3, 'Chris']])
>>> print d

  Num    Name
A    1  Andrew
B    2     NaN
C    3   Chris

I can use the following line of code to get what I'm looking for:

d['Name'][d['Name'].isnull()]=d.index

However, I get the following warning: "A value is trying to be set on a copy of a slice from a DataFrame"

I imagine it'd be better to do this either using fillna or loc, but I can't figure out how to do this with either. I have tried the following:

>>> d['Name']=d['Name'].fillna(d.index)

>>> d.loc[d['Name'].isnull()]=d.index

Any suggestions on which is the best option?

Upvotes: 10

Views: 10419

Answers (2)

EdChum
EdChum

Reputation: 394389

IMO you should use fillna, as the Index type is not an acceptable data type for the fill value you need to pass a series. Index has a to_series method:

In [13]:
d=pd.DataFrame(index=['A','B','C'], columns=['Num','Name'], data=[[1,'Andrew'], [2, np.nan], [3, 'Chris']])
d['Name']=d['Name'].fillna(d.index.to_series())
d

Out[13]:
   Num    Name
A    1  Andrew
B    2       B
C    3   Chris

Upvotes: 13

user2734178
user2734178

Reputation: 227

I would use .loc in this situation like this:

d.loc[d['Name'].isnull(), 'Name'] = d.loc[d['Name'].isnull()].index

Upvotes: 5

Related Questions