Reputation: 557
some_dict = {'a': [1,2,3,4], 'b': [5,6,7,8],}
df = pd.DataFrame(some_dict)
mask1 = pd.Series([False, True, False, True])
df['c'] = df.loc[mask1, 'a']
This will create a new column c
with NaN-values where mask1
is False
, and the values from column a
where mask1
is True
. Now I add b
to c
, on a different condition
mask2 = pd.Series([True, False, False, False])
df['c'] += df.loc[mask2, 'b']
However, this will put ALL values to NaN. I'm guessing this is because NaN + 0
and 0 + NaN
are both equal to NaN
. I tried solving this by writing
df['c'] = df.loc[mask1, 'a'].fillna(0)
df['c'] = df.loc[mask2, 'b'].fillna(0)
Why is replacing NaN
with 0
through fillna()
not working?
Upvotes: 1
Views: 1667
Reputation: 862771
If check how it working missing values are added only for not matched rows - False valuss in mask:
print (df.loc[mask1, 'a'])
1 2
3 4
Name: a, dtype: int64
So if want replace NaN there are no missing values, so cannot do it:
print (df.loc[mask1, 'a'].fillna(0))
1 2
3 4
Name: a, dtype: int64
If assign to column then not matched values has to be created too - and because pandas no idea what should be is created NaN
s:
df['c'] = df.loc[mask1, 'a']
print (df)
a b c
0 1 5 NaN <- False
1 2 6 2.0
2 3 7 NaN <- False
3 4 8 4.0
So if need replace NaN to 0
need numpy.where
- if True
s is passed values from a
if False
is passed 0
:
df['c'] = np.where(mask1, df['a'], 0)
print (df)
a b c
0 1 5 0
1 2 6 2
2 3 7 0
3 4 8 4
Another pandas alternative is Series.where
:
df['c'] = df['a'].where(mask1, 0)
print (df)
a b c
0 1 5 0
1 2 6 2
2 3 7 0
3 4 8 4
All together:
some_dict = {'a': [1,2,3,4], 'b': [5,6,7,8],}
df = pd.DataFrame(some_dict)
mask1 = pd.Series([False, True, False, True])
df['c'] = np.where(mask1, df['a'], 0)
mask2 = pd.Series([True, False, False, False])
df['c'] += np.where(mask2, df['b'], 0)
print (df)
a b c
0 1 5 5
1 2 6 2
2 3 7 0
3 4 8 4
Another idea is use Series.add
with fill_value=0
:
some_dict = {'a': [1,2,3,4], 'b': [5,6,7,8],}
df = pd.DataFrame(some_dict)
mask1 = pd.Series([False, True, False, True])
df['c'] = df.loc[mask1, 'a']
print (df)
a b c
0 1 5 NaN
1 2 6 2.0
2 3 7 NaN
3 4 8 4.0
mask2 = pd.Series([True, False, False, False])
df['c'] = df['c'].add(df.loc[mask2, 'b'], fill_value=0)
print (df)
a b c
0 1 5 5.0
1 2 6 2.0
2 3 7 NaN
3 4 8 4.0
Upvotes: 2
Reputation: 26221
Try to always match the lvalue's index with the rvalue's one. Otherwise, pandas does its best to reindex the two before doing your operation.
Also, always explicitly set a new column to some default value of your choice, e.g. 0, before doing stride assignments (it will also give you the correct dtype
).
So, concretely:
some_dict = {'a': [1,2,3,4], 'b': [5,6,7,8],}
df = pd.DataFrame(some_dict)
mask1 = pd.Series([False, True, False, True])
df['c'] = 0
df.loc[mask1, 'c'] = df.loc[mask1, 'a']
mask2 = pd.Series([True, False, False, False])
df.loc[mask2, 'c'] += df.loc[mask2, 'b']
print(df)
a b c
0 1 5 5
1 2 6 2
2 3 7 0
3 4 8 4
Upvotes: 1