user9548409
user9548409

Reputation: 89

pandas: fill nans given a condition

I'm struggling with something that seemed to be trivial but apparently isn't. General picture: data - pandas dataframe - contains(among others) TOTAL_VISITS and NUM_PRINTS columns.

Objective: given num_prints parameter, find rows where NUM_prints = num_prints and fill nans with a given number.

Where I stopped and it didn't make sense anymore:

indices= data['NUM_PRINTS'] == num_prints

data.loc[indices,'TOTAL_VISITS'].fillna(5,inplace=True)

This should work as much as I know and read. didn't fill nans with anything in practice, seemed like it worked with a copy or something as it didn't change anything in the original object.

What works:

data.loc[indices,'TOTAL_VISITS'] = 2

this does fill the column with 2's on the num_print condition, but does not consider nans.

data['TOTAL_VISITS'].fillna(0, inplace=True)

this does fill the nans in total visits with 0, but does not consider the num_prints condition.

I'm kinda hopeless as a regular for loop with .iloc and conditions takes way too long than I can handle.

Upvotes: 6

Views: 7482

Answers (3)

Shashank Singh Yadav
Shashank Singh Yadav

Reputation: 459

I think this one works fine

data['TOTAL_VISITS'] = np.where(data['NUM_PRINTS'] == 1, 100, data['TOTAL_VISITS'])

Upvotes: 3

jpp
jpp

Reputation: 164623

One way to apply your logic, without using fillna is to define a mask including your criteria. Then update a series via .loc using this mask.

The below example works with data from @jezrael.

num_prints = 1

mask = (data['NUM_PRINTS'] == num_prints) & data['TOTAL_VISITS'].isnull()

data.loc[mask, 'TOTAL_VISITS'] = 100

Upvotes: 0

jezrael
jezrael

Reputation: 862511

I think need filter in both sides and apply fillna only for filtered rows:

np.random.seed(1213)

c = ['TOTAL_VISITS', 'A', 'NUM_PRINTS']
data = pd.DataFrame(np.random.choice([1,np.nan,3,4], size=(10,3)), columns=c)
print (data)
   TOTAL_VISITS    A  NUM_PRINTS
0           1.0  4.0         4.0
1           NaN  3.0         1.0
2           1.0  1.0         1.0
3           4.0  3.0         3.0
4           1.0  3.0         4.0
5           4.0  4.0         3.0
6           4.0  1.0         4.0
7           NaN  4.0         3.0
8           NaN  NaN         3.0
9           3.0  NaN         1.0


num_prints = 1
indices= data['NUM_PRINTS'] == num_prints
data.loc[indices,'TOTAL_VISITS'] = data.loc[indices,'TOTAL_VISITS'].fillna(100)
#alternative
#data.loc[indices,'TOTAL_VISITS'] = data['TOTAL_VISITS'].fillna(100)
print (data)
   TOTAL_VISITS    A  NUM_PRINTS
0           1.0  4.0         4.0
1         100.0  3.0         1.0
2           1.0  1.0         1.0
3           4.0  3.0         3.0
4           1.0  3.0         4.0
5           4.0  4.0         3.0
6           4.0  1.0         4.0
7           NaN  4.0         3.0
8           NaN  NaN         3.0
9           3.0  NaN         1.0

Upvotes: 2

Related Questions