Reputation: 1323
I have a DataFrame. Which I currently groupby a column say column_a
. I am trying to get unique items of column_b
with the associated column_c
:
Like so:
(10.2*4 + 12.4*4 + 4.5*5) / (10.2+12.4+4.5) = 112.9 / 27.1 = 4.166
A snippet of the data is shown below
df =pd.DataFrame({"column_a": [1,1,1,1,1,1,1,1,1],
"column_b": [10.2, 10.2, 10.2, 12.4, 12.4, 12.4, 12.4, 4.5, 4.5],
"column_c": [4,4,4,4,4,4,4,5,5]})
df
column_a column_b column_c
0 1 10.2 4
1 1 10.2 4
2 1 10.2 4
3 1 12.4 4
4 1 12.4 4
5 1 12.4 4
6 1 12.4 4
7 1 4.5 5
8 1 4.5 5
Here is what I tried, unfortunately, the unique items in column_c is not the same with column_b, please how can I solve this problem?
g =df.groupby("column_a")
def func1(row):
unique_b = row["column_b"].unique()
unique_c = row["column_c"].unique()
aggregated_b = sum(unique_b)
aggregated = np.dot(unique_a, unique_b)/aggregated_b
return aggregated
g.apply(func1)
Upvotes: 0
Views: 30
Reputation: 260420
Seems like you want groupby
+apply
:
(df.drop_duplicates() # you should restrict the columns here if you have more
.groupby('column_a')
.apply(lambda g: (g['column_b']*g['column_c']).sum()/g['column_b'].sum())
)
output:
column_a
1 4.166052
dtype: float64
Upvotes: 1