user1610719
user1610719

Reputation: 1303

Pandas groupby: fill missing values from other group members

I think this is best shown with an example. What I'm trying to do is find the non-null number from a group and propagate it to the rest of the group.

In [52]: df = pd.DataFrame.from_dict({1:{'i_id': 2, 'i_num':1}, 2: {'i_id': 2, 'i_num': np.nan}, 3: {'i_id': 2, 'i_num': np.nan}, 4: {'i_id': 3, 'i_num': np.nan}, 5: {'i_id': 3, 'i_num': 5}}, orient='index')

In [53]: df
Out[53]:
   i_num  i_id
1      1     2
2    NaN     2
3    NaN     2
4    NaN     3
5      5     3

The DataFrame would look something like this. What I want is to take all the i_id == 2 and make their i_num == 1, and all the i_id == 3, and make their i_num == 5 (so both matching their non-null group neighbors).

So the end result would be this:

   i_num  i_id
1      1     2
2      1     2
3      1     2
4      5     3
5      5     3

Upvotes: 2

Views: 2920

Answers (1)

Alex Riley
Alex Riley

Reputation: 176770

first finds the first non-null value in a group. You can fill in the other values in each group like this:

df['i_num'] = df.groupby('i_id')['i_num'].transform('first')

This produces the column as required:

   i_num  i_id
1      1     2
2      1     2
3      1     2
4      5     3
5      5     3

Bear in mind that this will replace all values in the group with the first value, not just NaN values (this seems to be what you're looking for here though).

Alternatively - and to respect any other non-null values in the group - you can use fillna in the following way:

# make a column of first values for each group
x = df['i_id'].map(df.groupby('i_id')['i_num'].first())
# fill only NaN values using new column x
df['i_num'] = df['i_num'].fillna(x)

Upvotes: 5

Related Questions