Reputation: 930
I feel like this question has been asked a millions times before, but I just can't seem to get it to work or find a SO-post answering my question.
So I am selecting a subset of a pandas DataFrame and want to change these values individually.
I am subselecting my DataFrame like this:
df.loc[df[key].isnull(), [keys]]
which works perfectly. If I try and set all values to the same value such as
df.loc[df[key].isnull(), [keys]] = 5
it works as well. But if I try and set it to a DataFrame it does not, however no error is produced either.
So for example I have a DataFrame:
data = [['Alex',10,0,0,2],['Bob',12,0,0,1],['Clarke',13,0,0,4],['Dennis',64,2],['Jennifer',56,1],['Tom',95,5],['Ellen',42,2],['Heather',31,3]]
df1 = pd.DataFrame(data,columns=['Name','Age','Amount_of_cars','cars_per_year','some_other_value'])
Name Age Amount_of_cars cars_per_year some_other_value
0 Alex 10 0 0.0 2.0
1 Bob 12 0 0.0 1.0
2 Clarke 13 0 0.0 4.0
3 Dennis 64 2 NaN NaN
4 Jennifer 56 1 NaN NaN
5 Tom 95 5 NaN NaN
6 Ellen 42 2 NaN NaN
7 Heather 31 3 NaN NaN
and a second DataFrame:
data = [[2/64,5],[1/56,1],[5/95,7],[2/42,5],[3/31,7]]
df2 = pd.DataFrame(data,columns=['cars_per_year','some_other_value'])
cars_per_year some_other_value
0 0.031250 5
1 0.017857 1
2 0.052632 7
3 0.047619 5
4 0.096774 7
and I would like to replace those nans
with the second DataFrame
df1.loc[df1['cars_per_year'].isnull(),['cars_per_year','some_other_value']] = df2
Unfortunately this does not work as the index does not match. So how do I ignore the index, when setting values?
Any help would be appreciated. Sorry if this has been posted before.
Upvotes: 1
Views: 245
Reputation: 862531
It is possible only if number of mising values is same like number of rows in df2, then assign array for prevent index alignment:
df1.loc[df1['cars_per_year'].isnull(),['cars_per_year','some_other_value']] = df2.values
print (df1)
Name Age Amount_of_cars cars_per_year some_other_value
0 Alex 10 0 0.000000 2.0
1 Bob 12 0 0.000000 1.0
2 Clarke 13 0 0.000000 4.0
3 Dennis 64 2 0.031250 5.0
4 Jennifer 56 1 0.017857 1.0
5 Tom 95 5 0.052632 7.0
6 Ellen 42 2 0.047619 5.0
7 Heather 31 3 0.096774 7.0
If not, get errors like:
#4 rows assigned to 5 rows
data = [[2/64,5],[1/56,1],[5/95,7],[2/42,5]]
df2 = pd.DataFrame(data,columns=['cars_per_year','some_other_value'])
df1.loc[df1['cars_per_year'].isnull(),['cars_per_year','some_other_value']] = df2.values
ValueError: shape mismatch: value array of shape (4,) could not be broadcast to indexing result of shape (5,)
Another idea is set index of df2
by index of filtered rows in df1
:
df2 = df2.set_index(df1.index[df1['cars_per_year'].isnull()])
df1.loc[df1['cars_per_year'].isnull(),['cars_per_year','some_other_value']] = df2
print (df1)
Name Age Amount_of_cars cars_per_year some_other_value
0 Alex 10 0 0.000000 2.0
1 Bob 12 0 0.000000 1.0
2 Clarke 13 0 0.000000 4.0
3 Dennis 64 2 0.031250 5.0
4 Jennifer 56 1 0.017857 1.0
5 Tom 95 5 0.052632 7.0
6 Ellen 42 2 0.047619 5.0
7 Heather 31 3 0.096774 7.0
Upvotes: 2
Reputation: 18647
Just add .values
or .to_numpy()
if using pandas v 0.24 +
df1.loc[df1['cars_per_year'].isnull(),['cars_per_year','some_other_value']] = df2.values
Name Age Amount_of_cars cars_per_year some_other_value
0 Alex 10 0 0.000000 2.0
1 Bob 12 0 0.000000 1.0
2 Clarke 13 0 0.000000 4.0
3 Dennis 64 2 0.031250 5.0
4 Jennifer 56 1 0.017857 1.0
5 Tom 95 5 0.052632 7.0
6 Ellen 42 2 0.047619 5.0
7 Heather 31 3 0.096774 7.0
Upvotes: 2