ccsv
ccsv

Reputation: 8669

Python pandas creating a function to calculate the mean for rows of n columns

I have 2 different dataframes of coin flips. I want to make a function that find 2 things:

Is it possible to make the function dynamic for n columns?

import pandas as pd
import numpy as np

df=pd.DataFrame({'Users': [ 'Bob', 'Jim', 'Ted', 'Jesus', 'James'],
                 'Round 1': ['np.nan','H','np.nan','T','H'],
                 'Round 2': ['np.nan','H','H','H','T'],
                 'Round 3': ['np.nan','T','T','T','T'],
                 })

df2=pd.DataFrame({'Users': [ 'Boob', 'Paul', 'Todd', 'Zeus', 'Derrik'],
                 'Round 1': ['H','H','np.nan','T','np.nan'],
                 'Round 3': ['H','T','H','T','np.nan'],
                 'Round 5': ['H','T','H','T','np.nan'],
                 'Round 7': ['H','H','H','H','H'],
                 })

df = df.set_index('Users')
df2 = df2.set_index('Users')
print (df)
print (df2)

Here is what I tried:

def score(data):
    score_map = {'H':1, 'T':0}
    data=data.replace(score_map)
    data['average']=
    data['rounds played']=

df=score(df)

I am guessing I have to use groupby if this is possible

The results should look something like this:

      Round 1 Round 2 Round 3  Average   Rounds played
Users                        
Bob    np.nan  np.nan  np.nan   NaN      0
Jim         1       1       0   0.66     3
Ted    np.nan       1       0   0.5      2
Jesus       0       1       0   0.33     3 
James       1       0       0   0.33     2

[5 rows x 3 columns]

Upvotes: 1

Views: 386

Answers (1)

Happy001
Happy001

Reputation: 6383

In [104]: def score_map(x):
   .....:         if x=='H': return 1
   .....:         if x=='T': return 0
   .....:         return np.nan
   .....: 

In [105]: def score(data):
   .....:         return_df = data.applymap(score_map)
   .....:         avg = return_df.mean(axis=1)
   .....:         nrounds = return_df.count(axis=1)
   .....:         return_df['Average'] = avg
   .....:         return_df['Rounds Played']=nrounds
   .....:         return return_df
   .....: 

In [106]: score(df)
Out[106]: 
       Round 1  Round 2  Round 3   Average  Rounds Played
Users                                                    
Bob        NaN      NaN      NaN       NaN              0
Jim          1        1        0  0.666667              3
Ted        NaN        1        0  0.500000              2
Jesus        0        1        0  0.333333              3
James        1        0        0  0.333333              3

[5 rows x 5 columns]

Upvotes: 1

Related Questions