Reputation: 5

Counter according to other data and a range

The problem here's that I need a function to calculate the number of people who likes certain consoles, according to their civil status, gender, and a list of their ages (thought range() could work). Already made the code using pandas, but somehow i couldn't find any command that could help me.

        CustomerNo  Name LastName  Age CivilState Gender FavouriteConsole
0           1        Joe     Smith   48          M      M                W
1           2   Jonathan      Cage   20          S      M                X
2           3       Lucy     Chang   26          S      F                P
99         100  Alexander   Levine   41          M      M                X

For Favourite console W stands for wii, P for Ps3 and x for xbox.

What i need is something like this...

For Wii there are x people, which x2 is male and x3 female, x3 are married and x4 are single, the range onf ages is [x5 to x6]

for each one of the consoles...

Upvotes: 0

Answers (3)

Steve Misuta

Reputation: 1033

Assuming your data is in DataFrame named df:

dfW = df[(df.FavouriteConsole == 'W')]   # select Wii lovers
male = (dfW.Gender == 'M').sum()         # count males
female = (dfW.Gender == 'F').sum()       # count female
min_age = dfW.Age.min()                  # minimum age
etc...

EDIT: Just want to follow up how you could summarize this data in a DataFrame and then access whatever you wanted:

cons = []
g = df.groupby('FavouriteConsole')
for gp in g:
    cons.append([gp[0],                             # Console Type
                 gp[1].shape[0],                    # Count = number rows
                (gp[1].Gender == 'M').sum(),        # Males
                (gp[1].Gender == 'F').sum(),        # Females
                (gp[1].CivilState == 'M').sum(),    # Married
                (gp[1].CivilState == 'S').sum(),    # Single
                 gp[1].Age.min(),                   # Min Age
                 gp[1].Age.max()])                  # Max Age
summary = pd.DataFrame(cons, 
            columns=['Console','Count','Male','Female', 'Married', 
                        'Single', 'Min_Age','Max_Age'])
summary.set_index('Console', inplace=True)
print(summary)

         Count  Male  Female  Married  Single  Min_Age  Max_Age
Console                                                        
P            1     0       1        0       1       26       26
W            1     1       0        1       0       48       48
X            2     2       0        1       1       20       41

And you can access any particular characteristic like this:

In [20]: summary.loc['X','Male']
Out[20]: 2

In [21]: summary.loc['P','Single']
Out[21]: 1

Upvotes: 1

sohail288

Reputation: 423

Age Range

 g = df.groupby('FavouriteConsole')

Counts for all consoles

  counts   = g.size().loc

Male Female

gender_group = df.groupby(['FavouriteConsole', 'Gender']).size().loc

Civil State

civil_group = df.groupby(['FavouriteConsole', 'CivilState']).size().loc

and the final string

fmt = "For {} there are {} people, which {} is male and {} female, {} are married and {} are single, the range onf ages is [{} to {}]"

for console in ['P', 'X', 'W']:
    fmt.format(console, 
               counts[console], 
               gender_group[console].loc['M'],
               gender_group[console].loc['F'],
               civil_group[console].loc['M'],
               civil_group[console].loc['S'],
               g.Age.min(),
               g.Age.max())

This crashes with missing values. For example, if there are no male wii players.

Upvotes: 0

Pravar Jawalekar

Reputation: 605

Below is general function and not related to Panda.

Suppose you have data in some file say a.txt. Below code scans file and prints output as mentioned in question,

#!/usr/bin/python

import sys
import os

def calc(console_name):
        f = open("a.txt")

        lines = f.readlines()[1:]
        # skipped first line

        num_people = 0
        num_males = 0
        num_females = 0
        num_married = 0
        num_unmarried = 0
        min_age = 0
        max_age = 0


        for line in lines:
                fields = line.split()

                if (fields[-1] == console_name):
                        num_people += 1

                        # Age
                        age = int(fields[4])
                        if min_age == 0:
                                min_age = age
                        else:
                                if age < min_age:
                                        min_age = age

                        if max_age == 0:
                                max_age = age
                        else:
                                if age >= max_age:
                                        max_age = age

                        # civil status
                        if fields[5] == 'M':
                                num_married += 1
                        elif fields[5] == 'S':
                                num_unmarried += 1

                        # Gender
                        if fields[6] == 'M':
                                num_males += 1
                        else:
                                num_females += 1


        print "For %s there are %s people, which %s are male and %s female, \
                %s are married and %s are unmarried, age range [ %s - %s]" % (console_name, num_people, num_males, num_females, num_married, num_unmarried, min_age, max_age)

c = 'W'
calc(c)

calc is function which does actual work. I have tested it for W field.

   Elite-MT-PC:~/Documents/programs$ python a.py 
    For W there are 1 people, which 1 are male and 0 female, 1 are married and 0 are unmarried, age range [ 48 - 48]

Upvotes: 0

Counter according to other data and a range

Answers (3)

Related Questions