Reputation: 129
I am trying to tackle something on Pandas but I am not sure where to start.
I have a dataframe with multiple columns, but the ones of interest for this question look like this:
df = pd.DataFrame(data = {'subject': [1, 1, 1, 2, 2, 2, 3, 3, 3], 'val': [np.nan, 2, np.nan, np.nan, np.nan, 7, np.nan, np.nan, 10]})
subject val
0 1 NaN
1 1 2.0
2 1 NaN
3 2 NaN
4 2 NaN
5 2 7.0
6 3 NaN
7 3 NaN
8 3 10.0
I want to create a third column which, for each subject, has the value on column val for the corresponding subject:
subject val total
0 1 NaN 2
1 1 2.0 2
2 1 NaN 2
3 2 NaN 7
4 2 NaN 7
5 2 7.0 7
6 3 NaN 10
7 3 NaN 10
8 3 10.0 10
I know I can do
df[['subject', 'val']].dropna()
to get the values for the third column, but that would lose all the other columns in the dataframe (which has different values in each row).
Thanks
Upvotes: 0
Views: 168
Reputation: 11192
try this,
df['total'] =df.groupby('subject')['val'].transform('sum')
or
df['total2'] =df.groupby('subject')['val'].transform(lambda x:x[x.notnull()].unique()) #this will remove NaN records and give you unique element in each group
Output:
subject val total total2
0 1 NaN 2.0 2.0
1 1 2.0 2.0 2.0
2 1 NaN 2.0 2.0
3 2 NaN 7.0 7.0
4 2 NaN 7.0 7.0
5 2 7.0 7.0 7.0
6 3 NaN 10.0 10.0
7 3 NaN 10.0 10.0
8 3 10.0 10.0 10.0
Upvotes: 1
Reputation: 323376
By using ffill
and bfill
df['New']=df.groupby('subject').val.apply(lambda x : x.ffill().bfill())
df
Out[257]:
subject val New
0 1 NaN 2.0
1 1 2.0 2.0
2 1 NaN 2.0
3 2 NaN 7.0
4 2 NaN 7.0
5 2 7.0 7.0
6 3 NaN 10.0
7 3 NaN 10.0
8 3 10.0 10.0
Upvotes: 1