Reputation: 89
I'm struggling with something that seemed to be trivial but apparently isn't. General picture: data
- pandas dataframe - contains(among others) TOTAL_VISITS
and NUM_PRINTS
columns.
Objective: given num_prints
parameter, find rows where NUM_prints = num_prints
and fill nan
s with a given number.
Where I stopped and it didn't make sense anymore:
indices= data['NUM_PRINTS'] == num_prints
data.loc[indices,'TOTAL_VISITS'].fillna(5,inplace=True)
This should work as much as I know and read. didn't fill nans with anything in practice, seemed like it worked with a copy or something as it didn't change anything in the original object.
What works:
data.loc[indices,'TOTAL_VISITS'] = 2
this does fill the column with 2's on the num_print
condition, but does not consider nans.
data['TOTAL_VISITS'].fillna(0, inplace=True)
this does fill the nans in total visits with 0, but does not consider the num_prints
condition.
I'm kinda hopeless as a regular for
loop with .iloc
and conditions takes way too long than I can handle.
Upvotes: 6
Views: 7482
Reputation: 459
I think this one works fine
data['TOTAL_VISITS'] = np.where(data['NUM_PRINTS'] == 1, 100, data['TOTAL_VISITS'])
Upvotes: 3
Reputation: 164623
One way to apply your logic, without using fillna
is to define a mask including your criteria. Then update a series via .loc
using this mask.
The below example works with data from @jezrael.
num_prints = 1
mask = (data['NUM_PRINTS'] == num_prints) & data['TOTAL_VISITS'].isnull()
data.loc[mask, 'TOTAL_VISITS'] = 100
Upvotes: 0
Reputation: 862511
I think need filter in both sides and apply fillna
only for filtered rows:
np.random.seed(1213)
c = ['TOTAL_VISITS', 'A', 'NUM_PRINTS']
data = pd.DataFrame(np.random.choice([1,np.nan,3,4], size=(10,3)), columns=c)
print (data)
TOTAL_VISITS A NUM_PRINTS
0 1.0 4.0 4.0
1 NaN 3.0 1.0
2 1.0 1.0 1.0
3 4.0 3.0 3.0
4 1.0 3.0 4.0
5 4.0 4.0 3.0
6 4.0 1.0 4.0
7 NaN 4.0 3.0
8 NaN NaN 3.0
9 3.0 NaN 1.0
num_prints = 1
indices= data['NUM_PRINTS'] == num_prints
data.loc[indices,'TOTAL_VISITS'] = data.loc[indices,'TOTAL_VISITS'].fillna(100)
#alternative
#data.loc[indices,'TOTAL_VISITS'] = data['TOTAL_VISITS'].fillna(100)
print (data)
TOTAL_VISITS A NUM_PRINTS
0 1.0 4.0 4.0
1 100.0 3.0 1.0
2 1.0 1.0 1.0
3 4.0 3.0 3.0
4 1.0 3.0 4.0
5 4.0 4.0 3.0
6 4.0 1.0 4.0
7 NaN 4.0 3.0
8 NaN NaN 3.0
9 3.0 NaN 1.0
Upvotes: 2