Jammy
Jammy

Reputation: 87

Pandas: group by but showing missing value

Here's how the dataframe, df, looks like. I want to group by the position and to calculate the value in each group of level using the function I defined.

data

ID   Position   Level    Value
001     N         L         15
001     N         H         21
001     N         L         17
001     S         L         12
001     S         H         20
001     S         L         16

Here's how the function looks like

def rule(df):
    h_value = df.loc[df['level'] == 'H'], value]
    l_value = df.loc[df['level'] == 'L'], value].mean()

    df['dtm'] = h_value - l_value
    
    return df

Here's the groupby code.

new_df = df.groupby(['ID','Position']).apply(lambda x: rule(x))

but the result(new_df) shows

ID   Position   Level    Value     dtm     expected result of dtm
001     N         L         15     Nan              5
001     N         H         21      5               5
001     N         L         17     Nan              5
001     S         L         12     Nan              6
001     S         H         20      6               6
001     S         L         16     Nan              6

How can I fill the missing values with the same value of dtm in each group of Position. Thanks

Upvotes: 1

Views: 391

Answers (1)

Henry Yik
Henry Yik

Reputation: 22493

h_value = df.loc[df['Level'] == 'H', "Value"] returns a pd.Series. The minus operation also returns a Series which would assign the value only by the Index.

To fix this, you can add iat[0], i.e. h_value = df.loc[df['Level'] == 'H', "Value"].iat[0], and the result would properly propagate.

   ID Position Level  Value  dtm
0   1        N     L     15  5.0
1   1        N     H     21  5.0
2   1        N     L     17  5.0
3   1        S     L     12  6.0
4   1        S     H     20  6.0
5   1        S     L     16  6.0

Upvotes: 2

Related Questions