Reputation: 191
I have the following dataframe df1:
import pandas as pd
data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy', 'Lisa', 'Molly', 'Lisa', 'Molly', 'Fred'],
'gender': ['m', 'f', 'f', 'm', 'f', 'f', 'f', 'f','f', 'm'],
}
df1 = pd.DataFrame(data, index = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
I want to create a table with some standard and some custom summary statistics df2.
df2 = df1.describe()
df2.rename(index={'top':'mode'},inplace=True)
df2.rename(index={'freq':'mode freq'},inplace=True)
df2
df2:
gender name
count 10 10
unique 2 7
mode f Molly
mode freq 7 3
I want to append one row to df2 for the second mode and one for the frequency of the second mode:
Example:
gender name
count 10 10
unique 2 7
mode f Molly
mode freq 7 3
2nd mode m Lisa
2nd freq 3 2
I figured out that you can get the second mode & frequency by doing this:
my_series
for column in df1:
my_series=df1[column].value_counts()[1:2]
print(my_series)
But how do I append this to df2?
Upvotes: 3
Views: 296
Reputation: 294258
Counter
from collections import Counter
def f(s):
return pd.Series(Counter(s).most_common(2)[1], ['mode2', 'mode2 freq'])
df1.describe().rename(dict(top='mode1', freq='mode1 freq')).append(df1.apply(f))
name gender
count 10 10
unique 7 2
mode1 Molly f
mode1 freq 3 7
mode2 Lisa m
mode2 freq 2 3
value_counts
Same thing without Counter
def f(s):
c = s.value_counts()
return pd.Series([s.iat[1], s.index[1]], ['mode2', 'mode2 freq'])
df1.describe().rename(dict(top='mode1', freq='mode1 freq')).append(df1.apply(f))
def f(s):
f, u = pd.factorize(s)
c = np.bincount(f)
i = np.argpartition(c, -2)[-2]
return pd.Series([u[i], c[i]], ['mode2', 'mode2 freq'])
df1.describe().rename(dict(top='mode1', freq='mode1 freq')).append(df1.apply(f))
Upvotes: 2
Reputation: 323226
You can do apply
with value_counts
, then we need modify your dataframe shape .
df1.apply(lambda x : x.value_counts().iloc[[1]]).stack().reset_index(level=0).T
Out[172]:
name gender
level_0 Lisa m
0 2 3
The final out put (Change the index name using what you show to us rename
)
pd.concat([df1.describe(),df1.apply(lambda x : x.value_counts().iloc[[1]]).stack().reset_index(level=0).T])
Out[173]:
gender name
count 10 10
unique 2 7
top f Molly
freq 7 3
level_0 m Lisa
0 3 2
Upvotes: 4