windweller
windweller

Reputation: 2385

Find First occurrence of a user and assign values to it

Here's what my data look like:

user_id prior_elapse_time timestamp
115 NaN 0
115 10 1000
115 5 2000
222212 NaN 0
222212 8 500
222212 12 3000
222212 NaN 5000
222212 15 8000

I found similar posts that teach me how to get the first occurrence of a user:

train_df.groupby('user_id')['prior_elapsed_time'].first()

This would nicely get me all the first appearance of each user. However, now I'm at a loss at how to correctly assign 0 to the NaN only at the first occurrence of the user. Due to logging error, you can see that NaN appears elsewhere, but I only want to assign 0 to the boldfaced NaN.

I also tried

train_df['prior_elapse_time'][(train_df['prior_elapse_time'].isna()) & (train_df['timestamp'] == 0)] = 0

But then I get the "copy" vs. "view" assignment problem (which I don't fully understand).

Any help?

Upvotes: 2

Views: 87

Answers (1)

Sayandip Dutta
Sayandip Dutta

Reputation: 15872

If your df is sorted by user_id:

>>> df.loc[df.user_id.diff().ne(0), 'prior_elapse_time'] = 0
>>> df
   user_id  prior_elapse_time  timestamp
0      115                0.0          0
1      115               10.0       1000
2      115                5.0       2000
3   222212                0.0          0
4   222212                8.0        500
5   222212               12.0       3000
6   222212                NaN       5000
7   222212               15.0       8000

Alternatively, use pandas.Series.mask

>>> df['prior_elapse_time'] = df.prior_elapse_time.mask(df.user_id.diff().ne(0), 0)

If not sorted, then get the indices via groupby:

>>> idx = df.reset_index().groupby('user_id')['index'].first()
>>> df.loc[idx, 'prior_elapse_time'] = 0

If you want to set 0 to only those places where it was previously NaN, add pandas.Series.isnull mask to the columns.

>>> df.loc[
        (df.user_id.diff().ne(0) & df.prior_elapse_time.isnull()),
        'prior_elapse_time'
    ] = 0

Upvotes: 2

Related Questions