Reputation: 79
I try to learn more about the apply method in python and asking myself how to write the following code using apply:
I have a dataframe df like the following:
A B C D E points
0 0 0 0 1 43 94
1 0 0 1 1 55 62
2 1 1 0 1 21 84
3 1 0 1 0 13 20
Furthermore I have a function like the following, which does its job:
def f1(df):
df_means = pd.DataFrame(columns = ['Mean_Points'])
for columnname in df.columns:
if len(df[df[columnname] == 1]) > 1:
df_means.loc[columnname] = [df[df[columnname] == 1]['points'].mean()]
return df_means
So the output of f1 is
'Mean_Points'
A 52
C 41
D 80
and that's totally fine. But I am wondering if there is a possibility (I am sure there is) to obtain the same result with the apply method. I tried:
df_means = pd.DataFrame(columns = ['Mean_Points'])
cols = [col for col in df.columns if len(df[df[col] == 1]) > 1]
df_means.loc[cols] = df[cols].apply(lambda x: df[df[x] == 1]['points'].mean(), axis = 1)
or similar:
df_means = pd.DataFrame(columns = ['Mean_Points'])
df.columns.apply(lambda x: df_means.loc[x] = [df[df[x] == 1]['points'].mean()] if len(df[df[x] == 1]) > 1 else None)
and 2,3 other things, but nothing worked... I hope somebody can help me here?!
Upvotes: 1
Views: 1884
Reputation: 14216
Here is another way to do it, not purely pandas as others have shown.
cols = ['A', 'B', 'C', 'D']
def consolidate(series):
cond = series > 0
points = df.loc[cond, 'points']
if len(points) > 1:
return series.name, points.mean()
else:
return series.name, np.nan
df1 = pd.DataFrame([consolidate(df[col]) for col in cols], columns=['name', 'mean_points'])
print(df1)
name mean_points
0 A 52.0
1 B NaN
2 C 41.0
3 D 80.0
If no NaN
needed then
df1.dropna()
name mean_points
0 A 52.0
2 C 41.0
3 D 80.0
And using apply
df[cols].apply(consolidate,result_type='expand')
.T.dropna()
.reset_index()
.drop('index', axis=1)
0 A 52
1 C 41
2 D 80
Upvotes: 0
Reputation: 294258
pd.DataFrame.dot
# filters s to be just those
# things greater than 1
# v
s = df.eq(1).sum().loc[lambda x: x > 1]
df.loc[:, s.index].T.dot(df.points).div(s)
A 52.0
C 41.0
D 80.0
dtype: float64
This removes the chaff but probably does more calculations than necessary.
df.T.dot(df.points).div(df.sum())[df.eq(1).sum().gt(1)]
A 52.0
C 41.0
D 80.0
dtype: float64
Upvotes: 3
Reputation: 59549
In general, you should try to see if you can avoid using .apply(axis=1)
.
In this case, you can get by with DataFrame.mulitply()
, replacing 0
with np.NaN
so it doesn't count toward the average.
import numpy as np
s = df.replace(0, np.NaN).multiply(df.points, axis=0).mean()
#A 52.0
#B 84.0
#C 41.0
#D 80.0
#E 2369.0
#points 5034.0
#dtype: float64
Now we'll add your condition to only consider columns with multiple instances of 1
, and subset to those with .reindex
m = df.eq(1).sum().gt(1)
s = s.reindex(m[m].index)
s
:A 52.0
C 41.0
D 80.0
dtype: float64
Upvotes: 3