Nils
Nils

Reputation: 930

Setting subset of a pandas DataFrame by a DataFrame

I feel like this question has been asked a millions times before, but I just can't seem to get it to work or find a SO-post answering my question.

So I am selecting a subset of a pandas DataFrame and want to change these values individually.

I am subselecting my DataFrame like this:

df.loc[df[key].isnull(), [keys]]

which works perfectly. If I try and set all values to the same value such as

df.loc[df[key].isnull(), [keys]] = 5

it works as well. But if I try and set it to a DataFrame it does not, however no error is produced either.

So for example I have a DataFrame:

data = [['Alex',10,0,0,2],['Bob',12,0,0,1],['Clarke',13,0,0,4],['Dennis',64,2],['Jennifer',56,1],['Tom',95,5],['Ellen',42,2],['Heather',31,3]]
df1 = pd.DataFrame(data,columns=['Name','Age','Amount_of_cars','cars_per_year','some_other_value'])

       Name  Age  Amount_of_cars  cars_per_year  some_other_value
0      Alex   10               0            0.0               2.0
1       Bob   12               0            0.0               1.0
2    Clarke   13               0            0.0               4.0
3    Dennis   64               2            NaN               NaN
4  Jennifer   56               1            NaN               NaN
5       Tom   95               5            NaN               NaN
6     Ellen   42               2            NaN               NaN
7   Heather   31               3            NaN               NaN

and a second DataFrame:

data = [[2/64,5],[1/56,1],[5/95,7],[2/42,5],[3/31,7]]
df2 = pd.DataFrame(data,columns=['cars_per_year','some_other_value'])

   cars_per_year  some_other_value
0       0.031250                 5
1       0.017857                 1
2       0.052632                 7
3       0.047619                 5
4       0.096774                 7

and I would like to replace those nans with the second DataFrame

df1.loc[df1['cars_per_year'].isnull(),['cars_per_year','some_other_value']] = df2

Unfortunately this does not work as the index does not match. So how do I ignore the index, when setting values?

Any help would be appreciated. Sorry if this has been posted before.

Upvotes: 1

Views: 245

Answers (2)

jezrael
jezrael

Reputation: 862531

It is possible only if number of mising values is same like number of rows in df2, then assign array for prevent index alignment:

df1.loc[df1['cars_per_year'].isnull(),['cars_per_year','some_other_value']] = df2.values
print (df1)
       Name  Age  Amount_of_cars  cars_per_year  some_other_value
0      Alex   10               0       0.000000               2.0
1       Bob   12               0       0.000000               1.0
2    Clarke   13               0       0.000000               4.0
3    Dennis   64               2       0.031250               5.0
4  Jennifer   56               1       0.017857               1.0
5       Tom   95               5       0.052632               7.0
6     Ellen   42               2       0.047619               5.0
7   Heather   31               3       0.096774               7.0

If not, get errors like:

#4 rows assigned to 5 rows
data = [[2/64,5],[1/56,1],[5/95,7],[2/42,5]]
df2 = pd.DataFrame(data,columns=['cars_per_year','some_other_value'])

df1.loc[df1['cars_per_year'].isnull(),['cars_per_year','some_other_value']] = df2.values

ValueError: shape mismatch: value array of shape (4,) could not be broadcast to indexing result of shape (5,)


Another idea is set index of df2 by index of filtered rows in df1:

df2 = df2.set_index(df1.index[df1['cars_per_year'].isnull()])
df1.loc[df1['cars_per_year'].isnull(),['cars_per_year','some_other_value']] = df2
print (df1)
       Name  Age  Amount_of_cars  cars_per_year  some_other_value
0      Alex   10               0       0.000000               2.0
1       Bob   12               0       0.000000               1.0
2    Clarke   13               0       0.000000               4.0
3    Dennis   64               2       0.031250               5.0
4  Jennifer   56               1       0.017857               1.0
5       Tom   95               5       0.052632               7.0
6     Ellen   42               2       0.047619               5.0
7   Heather   31               3       0.096774               7.0

Upvotes: 2

Chris Adams
Chris Adams

Reputation: 18647

Just add .values or .to_numpy() if using pandas v 0.24 +

df1.loc[df1['cars_per_year'].isnull(),['cars_per_year','some_other_value']] = df2.values

       Name  Age  Amount_of_cars  cars_per_year  some_other_value
0      Alex   10               0       0.000000               2.0
1       Bob   12               0       0.000000               1.0
2    Clarke   13               0       0.000000               4.0
3    Dennis   64               2       0.031250               5.0
4  Jennifer   56               1       0.017857               1.0
5       Tom   95               5       0.052632               7.0
6     Ellen   42               2       0.047619               5.0
7   Heather   31               3       0.096774               7.0

Upvotes: 2

Related Questions