Reputation: 123
First let say that i'm new to pandas .
I am trying to make a new column in a DataFrame. I am able to do this as shown in my example. But I want to do this by chaining methods, so I don't have to assign new variables. Let me first show what I want to achieve, and what I have done this so far:
In [1]:
import numpy as np
from pandas import Series,DataFrame
import pandas as pd
In [2]:
np.random.seed(10)
df=pd.DataFrame(np.random.randint(1,5,size=(10, 3)), columns=list('ABC'))
df
Out [2]:
A B C
2 2 1
4 1 2
4 1 2
2 1 2
2 3 1
2 1 3
1 3 1
4 1 1
4 4 3
1 4 3
In [3]:
filtered_DF = df[df['B']<2].copy()
grouped_DF = filtered_DF.groupby('A')
filtered_DF['C_Share_By_Group'] =filtered_DF.C.div(grouped_DF.C.transform("sum"))
filtered_DF
Out [3]:
A B C C_Share_By_Group
4 1 2 0.4
4 1 2 0.4
2 1 2 0.4
2 1 3 0.6
4 1 1 0.2
I want to achieve the same thing by chaining methods. In R with dplyr package, I would be able to do something like:
df %>%
filter(B<2) %>%
group_by(A) %>%
mutate('C_Share_By_Group'=C/sum(C))
In the pandas documentation it says that mutate
in R(dplyr) is equal to assign
in pandas, but assign
doesn't work on a grouped object.
When I try to assign something to grouped dataframe, I get an error:
"AttributeError: Cannot access callable attribute 'assign' of 'DataFrameGroupBy' objects, try using the 'apply' method"
I have tried the following, but don't know how to add the new column, or if it is even possible to achieve this by chaining methods:
(df.loc[df.B<2]
.groupby('A')
#****WHAT GOES HERE?**** apply(something)?
)
Upvotes: 12
Views: 5151