Reputation: 1694
I have a function returns series.index
and series.values
, how to write the returned results to a dataframe ?
Generate random data
import string
import random
import pandas as pd
text = []
i = 0
while i < 20:
text.extend(random.choice(string.ascii_letters[:4]))
i += 1
boolean = ['True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False']
bool1 = random.sample(boolean, 20)
bool2 = random.sample(boolean, 20)
bool3 = random.sample(boolean, 20)
bool4 = random.sample(boolean, 20)
d = {'c1':text, 'c2':bool1, 'c3':bool2, 'c4':bool3, 'y':bool4}
dd = pd.DataFrame(data=d)
dd.head(2)
c1 c2 c3 c4 y
0 b False False False True
1 a True True False True
The function
def relative_frequency(df, col):
series = df.groupby(col)['y'].value_counts(normalize=True)
true_cnt = series.xs('True', level=1) # a series with single layer index
max_index = true_cnt.index[true_cnt.argmax()]
max_val = true_cnt[max_index]
true_cnt_dropped = true_cnt.drop(max_index)
ans = max_val / true_cnt_dropped
ans.index = [(col + ' ' + max_index + '/' + idx) for idx in ans.index]
return ans.index, ans.values
Run the function
for i in dd.columns[:-1]:
print(relative_frequency(dd, i))
It returns
(Index(['c1 c/a', 'c1 c/b', 'c1 c/d'], dtype='object'), array([1.8 , 1.05, 1.2 ]))
(Index(['c2 False/True'], dtype='object'), array([1.5]))
(Index(['c3 True/False'], dtype='object'), array([2.33333333]))
(Index(['c4 False/True'], dtype='object'), array([1.5]))
I would like to build a dataframe like this
Upvotes: 0
Views: 344
Reputation: 19332
In the last part (where you run the function) do this instead -
df.T
Transposes it (swaps rows and cols)dfs.append()
appends it to an empty list called dfsdf.concat
combines them vertically as rowsdfs = []
for i in dd.columns[:-1]:
dfs.append(pd.DataFrame(relative_frequency(dd, i)).T)
result = pd.concat(dfs)
result.columns = ['features', 'relative_freq']
result
Upvotes: 1