Reputation: 5
The problem here's that I need a function to calculate the number of people who likes certain consoles, according to their civil status, gender, and a list of their ages (thought range() could work). Already made the code using pandas, but somehow i couldn't find any command that could help me.
CustomerNo Name LastName Age CivilState Gender FavouriteConsole
0 1 Joe Smith 48 M M W
1 2 Jonathan Cage 20 S M X
2 3 Lucy Chang 26 S F P
99 100 Alexander Levine 41 M M X
For Favourite console W stands for wii, P for Ps3 and x for xbox.
What i need is something like this...
For Wii there are x people, which x2 is male and x3 female, x3 are married and x4 are single, the range onf ages is [x5 to x6]
for each one of the consoles...
Upvotes: 0
Views: 223
Reputation: 1033
Assuming your data is in DataFrame named df:
dfW = df[(df.FavouriteConsole == 'W')] # select Wii lovers
male = (dfW.Gender == 'M').sum() # count males
female = (dfW.Gender == 'F').sum() # count female
min_age = dfW.Age.min() # minimum age
etc...
EDIT: Just want to follow up how you could summarize this data in a DataFrame and then access whatever you wanted:
cons = []
g = df.groupby('FavouriteConsole')
for gp in g:
cons.append([gp[0], # Console Type
gp[1].shape[0], # Count = number rows
(gp[1].Gender == 'M').sum(), # Males
(gp[1].Gender == 'F').sum(), # Females
(gp[1].CivilState == 'M').sum(), # Married
(gp[1].CivilState == 'S').sum(), # Single
gp[1].Age.min(), # Min Age
gp[1].Age.max()]) # Max Age
summary = pd.DataFrame(cons,
columns=['Console','Count','Male','Female', 'Married',
'Single', 'Min_Age','Max_Age'])
summary.set_index('Console', inplace=True)
print(summary)
Count Male Female Married Single Min_Age Max_Age
Console
P 1 0 1 0 1 26 26
W 1 1 0 1 0 48 48
X 2 2 0 1 1 20 41
And you can access any particular characteristic like this:
In [20]: summary.loc['X','Male']
Out[20]: 2
In [21]: summary.loc['P','Single']
Out[21]: 1
Upvotes: 1
Reputation: 423
Age Range
g = df.groupby('FavouriteConsole')
Counts for all consoles
counts = g.size().loc
Male Female
gender_group = df.groupby(['FavouriteConsole', 'Gender']).size().loc
Civil State
civil_group = df.groupby(['FavouriteConsole', 'CivilState']).size().loc
and the final string
fmt = "For {} there are {} people, which {} is male and {} female, {} are married and {} are single, the range onf ages is [{} to {}]"
for console in ['P', 'X', 'W']:
fmt.format(console,
counts[console],
gender_group[console].loc['M'],
gender_group[console].loc['F'],
civil_group[console].loc['M'],
civil_group[console].loc['S'],
g.Age.min(),
g.Age.max())
This crashes with missing values. For example, if there are no male wii players.
Upvotes: 0
Reputation: 605
Below is general function and not related to Panda.
Suppose you have data in some file say a.txt
.
Below code scans file and prints output as mentioned in question,
#!/usr/bin/python
import sys
import os
def calc(console_name):
f = open("a.txt")
lines = f.readlines()[1:]
# skipped first line
num_people = 0
num_males = 0
num_females = 0
num_married = 0
num_unmarried = 0
min_age = 0
max_age = 0
for line in lines:
fields = line.split()
if (fields[-1] == console_name):
num_people += 1
# Age
age = int(fields[4])
if min_age == 0:
min_age = age
else:
if age < min_age:
min_age = age
if max_age == 0:
max_age = age
else:
if age >= max_age:
max_age = age
# civil status
if fields[5] == 'M':
num_married += 1
elif fields[5] == 'S':
num_unmarried += 1
# Gender
if fields[6] == 'M':
num_males += 1
else:
num_females += 1
print "For %s there are %s people, which %s are male and %s female, \
%s are married and %s are unmarried, age range [ %s - %s]" % (console_name, num_people, num_males, num_females, num_married, num_unmarried, min_age, max_age)
c = 'W'
calc(c)
calc
is function which does actual work. I have tested it for W
field.
Elite-MT-PC:~/Documents/programs$ python a.py
For W there are 1 people, which 1 are male and 0 female, 1 are married and 0 are unmarried, age range [ 48 - 48]
Upvotes: 0