Reputation: 5670

Change first element of each group in pandas DataFrame

I want to ensure that the first value of val2 corresponding to each vintage is NaN. Currently two are already NaN, but I want to ensure that 0.53 also changes to NaN.

df = pd.DataFrame({
        'vintage': ['2017-01-01', '2017-01-01', '2017-01-01', '2017-02-01', '2017-02-01', '2017-03-01'],
        'date': ['2017-01-01', '2017-02-01', '2017-03-01', '2017-02-01', '2017-03-01', '2017-03-01'],
        'val1': [0.59, 0.68, 0.8, 0.54, 0.61, 0.6],
        'val2': [np.nan, 0.66, 0.81, 0.53, 0.62, np.nan]
    })

Here's what I've tried so far:

df.groupby('vintage').first().val2 #This gives the first non-NaN values, as shown below

vintage
2017-01-01    0.66
2017-02-01    0.53
2017-03-01     NaN

df.groupby('vintage').first().val2 = np.nan #This doesn't change anything
df.val2

0     NaN
1    0.66
2    0.81
3    0.53
4    0.62
5     NaN

Upvotes: 14

Answers (4)

misantroop

Reputation: 2572

Timings:

df = pd.DataFrame({
        'vintage': ['2017-01-01', '2017-01-01', '2017-01-01', '2017-02-01', '2017-02-01', '2017-03-01'],
        'date': ['2017-01-01', '2017-02-01', '2017-03-01', '2017-02-01', '2017-03-01', '2017-03-01'],
        'val1': [0.59, 0.68, 0.8, 0.54, 0.61, 0.6],
        'val2': [np.nan, 0.66, 0.81, 0.53, 0.62, np.nan]
    })

def BENY(df):
    df.loc[df.groupby('vintage').vintage.cumcount() == 0, 'val2'] = np.nan
    
def EdChum(df):
    df.loc[df.groupby('vintage')['val2'].head(1).index, 'val2'] = np.nan
    
def knoble(df):
    def func(x):
        x['val2'].iloc[0] = np.nan
        return x
    df.groupby("vintage", group_keys=False).apply(func)

%timeit BENY(df)
406 µs ± 4.19 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%timeit EdChum(df)
454 µs ± 1.28 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%timeit knoble(df)
1.07 ms ± 5.55 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Upvotes: 1

knoble

Reputation: 11

I think you could also write:

def h(x):
 x['val2'].iloc[0] = np.NaN
 return x

df = df.groupby("vintage").apply(h)

Upvotes: 1

BENY

Reputation: 323396

Or create the Position , pick the first one , change val2 to np.nan

df.loc[df.groupby('vintage').vintage.cumcount()==0,'val2']=np.nan
df
Out[154]: 
         date  val1  val2     vintage
0  2017-01-01  0.59   NaN  2017-01-01
1  2017-02-01  0.68  0.66  2017-01-01
2  2017-03-01  0.80  0.81  2017-01-01
3  2017-02-01  0.54   NaN  2017-02-01
4  2017-03-01  0.61  0.62  2017-02-01
5  2017-03-01  0.60   NaN  2017-03-01

Upvotes: 5

EdChum

Reputation: 394479

You can't assign to the result of an aggregation, also first ignores existing NaN, what you can do is call head(1) which will return the first row for each group, and pass the indices to loc to mask the orig df to overwrite those column values:

In[91]
df.loc[df.groupby('vintage')['val2'].head(1).index, 'val2'] = np.NaN
df:

Out[91]: 
         date  val1  val2     vintage
0  2017-01-01  0.59   NaN  2017-01-01
1  2017-02-01  0.68  0.66  2017-01-01
2  2017-03-01  0.80  0.81  2017-01-01
3  2017-02-01  0.54   NaN  2017-02-01
4  2017-03-01  0.61  0.62  2017-02-01
5  2017-03-01  0.60   NaN  2017-03-01

here you can see that head(1) returns the first row for each group:

In[94]:
df.groupby('vintage')['val2'].head(1)
Out[94]: 
0     NaN
3    0.53
5     NaN
Name: val2, dtype: float64

contrast with first which will return the first non-NaN unless there is only NaN values for that group:

In[95]:
df.groupby('vintage')['val2'].first()

Out[95]: 
vintage
2017-01-01    0.66
2017-02-01    0.53
2017-03-01     NaN
Name: val2, dtype: float64

Upvotes: 26

Change first element of each group in pandas DataFrame

Answers (4)

Related Questions