JA-pythonista
JA-pythonista

Reputation: 1323

How to apply functions in a pandas group

I have a DataFrame. Which I currently groupby a column say column_a. I am trying to get unique items of column_b with the associated column_c:

Like so:

(10.2*4 + 12.4*4 + 4.5*5) / (10.2+12.4+4.5) = 112.9 / 27.1 = 4.166

A snippet of the data is shown below

df =pd.DataFrame({"column_a": [1,1,1,1,1,1,1,1,1], 
                 "column_b": [10.2, 10.2, 10.2, 12.4, 12.4, 12.4, 12.4, 4.5, 4.5],
                 "column_c": [4,4,4,4,4,4,4,5,5]})
df

    column_a    column_b    column_c
0      1         10.2         4
1      1         10.2         4
2      1         10.2         4
3      1         12.4         4
4      1         12.4         4
5      1         12.4         4
6      1         12.4         4
7      1         4.5          5
8      1         4.5          5

Here is what I tried, unfortunately, the unique items in column_c is not the same with column_b, please how can I solve this problem?

g =df.groupby("column_a")

def func1(row):
    unique_b = row["column_b"].unique()
    unique_c = row["column_c"].unique()
    aggregated_b = sum(unique_b)
    aggregated = np.dot(unique_a, unique_b)/aggregated_b
    return aggregated
    
g.apply(func1)

Upvotes: 0

Views: 30

Answers (1)

mozway
mozway

Reputation: 260420

Seems like you want groupby+apply:

(df.drop_duplicates() # you should restrict the columns here if you have more
   .groupby('column_a')
   .apply(lambda g: (g['column_b']*g['column_c']).sum()/g['column_b'].sum())
)

output:

column_a
1    4.166052
dtype: float64

Upvotes: 1

Related Questions