Reputation: 60060
I have some data from an experiment, and within each trial there are some single values, surrounded by NA
's, that I want to fill out to the entire trial:
df = pd.DataFrame({'trial': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
'cs_name': [np.nan, 'A1', np.nan, np.nan, np.nan, np.nan, 'B2',
np.nan, 'A1', np.nan, np.nan, np.nan]})
Out[177]:
cs_name trial
0 NaN 1
1 A1 1
2 NaN 1
3 NaN 1
4 NaN 2
5 NaN 2
6 B2 2
7 NaN 2
8 A1 3
9 NaN 3
10 NaN 3
11 NaN 3
I'm able to fill these values within the whole trial by using both bfill()
and ffill()
, but I'm wondering if there is a better way to achieve this.
df['cs_name'] = df.groupby('trial')['cs_name'].ffill()
df['cs_name'] = df.groupby('trial')['cs_name'].bfill()
Expected output:
cs_name trial
0 A1 1
1 A1 1
2 A1 1
3 A1 1
4 B2 2
5 B2 2
6 B2 2
7 B2 2
8 A1 3
9 A1 3
10 A1 3
11 A1 3
Upvotes: 14
Views: 12128
Reputation: 71
If you want to avoid the error that appears when some groups contain only NaN you could do the following (Note that I changed the df so there are only Nan for the group having trial=1):
df = pd.DataFrame({'trial': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3,1,1],
'cs_name': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 'B2', np.nan,
'A3', np.nan, np.nan, np.nan, np.nan,np.nan]})
g = data.groupby('trial')
g['cs_name'].transform(lambda s: 'No values to aggregate' if
pd.isnull(s).all() == True else s.loc[s.first_valid_index()])
df['cs_name'] = g['cs_name'].transform(lambda s: 'No values to aggregate' if
pd.isnull(s).all() == True else s.loc[s.first_valid_index()])`
This way you input 'No Values to aggregate' (or whatever you want) when the program finds all NaN for a particular group, instead of an error.
Hope this helps :)
Federico
Upvotes: 6
Reputation: 375445
An alternative approach is to use first_valid_index
and a transform
:
In [11]: g = df.groupby('trial')
In [12]: g['cs_name'].transform(lambda s: s.loc[s.first_valid_index()])
Out[12]:
0 A1
1 A1
2 A1
3 A1
4 B2
5 B2
6 B2
7 B2
8 A1
9 A1
10 A1
11 A1
Name: cs_name, dtype: object
This ought to be more efficient then using ffill followed by a bfill...
And use this to change the cs_name
column:
df['cs_name'] = g['cs_name'].transform(lambda s: s.loc[s.first_valid_index()])
Note: I think it would be nice enhancement to have a method to grab the first non-null object in the pandas, in numpy it's an open request, I don't think there is currently a method (I could be wrong!)...
Upvotes: 15