Osca
Osca

Reputation: 1694

Create a dataframe from returned values from a function

I have a function returns series.index and series.values, how to write the returned results to a dataframe ?

Generate random data

import string
import random
import pandas as pd

text = []
i = 0
while i < 20:
    text.extend(random.choice(string.ascii_letters[:4]))
    i += 1

boolean = ['True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False', 'True', 'False']
bool1 = random.sample(boolean, 20)
bool2 = random.sample(boolean, 20)
bool3 = random.sample(boolean, 20)
bool4 = random.sample(boolean, 20)

d = {'c1':text, 'c2':bool1, 'c3':bool2, 'c4':bool3, 'y':bool4}
dd = pd.DataFrame(data=d)

dd.head(2)

    c1  c2  c3  c4  y
0   b   False   False   False   True
1   a   True    True    False   True

The function

def relative_frequency(df, col):
    series = df.groupby(col)['y'].value_counts(normalize=True)
    true_cnt = series.xs('True', level=1)  # a series with single layer index
    max_index = true_cnt.index[true_cnt.argmax()]
    max_val = true_cnt[max_index]
    true_cnt_dropped = true_cnt.drop(max_index)
    ans = max_val / true_cnt_dropped
    ans.index = [(col + ' ' + max_index + '/' + idx) for idx in ans.index]
    return ans.index, ans.values

Run the function

for i in dd.columns[:-1]:
    print(relative_frequency(dd, i))

It returns

(Index(['c1 c/a', 'c1 c/b', 'c1 c/d'], dtype='object'), array([1.8 , 1.05, 1.2 ]))
(Index(['c2 False/True'], dtype='object'), array([1.5]))
(Index(['c3 True/False'], dtype='object'), array([2.33333333]))
(Index(['c4 False/True'], dtype='object'), array([1.5]))

I would like to build a dataframe like this

enter image description here

Upvotes: 0

Views: 344

Answers (1)

Akshay Sehgal
Akshay Sehgal

Reputation: 19332

In the last part (where you run the function) do this instead -

  1. Converts the output of the function into a Dataframe
  2. df.T Transposes it (swaps rows and cols)
  3. dfs.append() appends it to an empty list called dfs
  4. df.concat combines them vertically as rows
  5. Columns names are added
dfs = []

for i in dd.columns[:-1]:
    dfs.append(pd.DataFrame(relative_frequency(dd, i)).T)
    
result = pd.concat(dfs)
result.columns = ['features', 'relative_freq']
result

enter image description here

Upvotes: 1

Related Questions