Reputation: 15
I have a category with certain characteristics (height & weight, defined by np.where) and a different category with other characteristics (if someone is a twin or not & how many siblings, defined by np.where). I want to see how many fall into both categories at the same time (like how many would be in the center if a Venn diagram was made?).
I'm importing columns of a CSV file. This is what the table looks like:
Child Inches Weight Twin Siblings
0 A 53 100 Y 3
1 B 54 110 N 4
2 C 56 120 Y 2
3 D 58 165 Y 1
4 E 60 150 N 1
5 F 62 160 N 1
6 H 65 165 N 3
import pandas as pd
import numpy as np
file = pd.read_csv(r'~/Downloads/Test3 CVS_Sheet1.csv')
#%%
height = file["Inches"]
weight = file["Weight"]
twin = file["Twin"]
siblings = file["Siblings"]
#%%
area1 = np.where((height <= 60) & (weight <= 150))[0]
#%%
#has two or more siblings (and is a twin)
group_a = np.where((siblings >= 2) & (twin == 'Y'))[0]
#has two or more siblings (and is not a twin)
group_b = np.where((siblings >= 2) & (twin == 'N'))[0]
#has only one sibling (and is twin)
group_c = np.where((siblings == 1) & (twin == 'Y'))[0]
#has only one sibling (and is not a twin)
group_d = np.where((siblings == 1) & (twin == 'N'))[0]
#%%
for i in area1:
if group_a==True:
print("in area1 there are", len(i), "children in group_a")
elif group_b==True:
print("in area1 there are", len(i), "children in group_b")
elif group_c==True:
print("in area1 there are", len(i), "children in group_c")
elif group_d==True:
print("in area1 there are", len(i), "children in group_d")
I get the error: "ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()"
I'm hoping for an output like:
"in area1 there are 2 children in group_a"
"in area1 there are 1 children in group_b"
"in area1 there are 0 children in group_c"
"in area1 there are 1 children in group_d"
Thanks in advance!
Upvotes: 1
Views: 89
Reputation: 10893
In your example, I would take a slightly different design. You can do this:
df['area1'] = np.where((df.Inches <= 60) & (df.Weight <= 150),1,0)
df['group_a'] = np.where((df.Siblings >= 2) & (df.Twin == 'Y'),1,0)
df['group_b'] = np.where((df.Siblings >= 2) & (df.Twin == 'N'),1,0)
df['group_c'] = np.where((df.Siblings == 1) & (df.Twin == 'Y'),1,0)
df['group_d'] = np.where((df.Siblings == 1) & (df.Twin == 'N'),1,0)
and the result would look like this:
From this point you can build your query such that in order to see group_b you would do:
df.groupby(['area1'])['group_b'].sum()[1]
and you'll have your desired result: 1. You can play with the sum or count to adjust to your table.
finally:
for col in df.columns[6:]:
r = df.groupby(['area1'])[col].sum()[1]
print ("in area1 there are",r,'children in',col)
would yield:
in area1 there are 2 children in group_a
in area1 there are 1 children in group_b
in area1 there are 0 children in group_c
in area1 there are 1 children in group_d
Upvotes: 0
Reputation: 384
I am not sure what you are trying to do with i and the looping but this should work
import os
import pandas as pd
file_data = pd.read_csv(r'~/Downloads/Test3 CVS_Sheet1.csv')
area1 = file_data[file_data['Inches'] <= 60]
area1 = area1[area1['Weight'] <= 150]
group_a = area1[area1['Siblings'] >= 2]
group_a = group_a[group_a['Twin'] == 'Y']
group_b = area1[area1['Siblings'] >= 2]
group_b = group_b[group_b['Twin'] == 'N']
group_c = area1[area1['Siblings'] == 1]
group_c = group_c[group_c['Twin'] == 'Y']
group_d = area1[area1['Siblings'] == 1]
group_d = group_d[group_d['Twin'] == 'N']
print("in area1 there are", len(group_a.index), "children in group_a")
print("in area1 there are", len(group_b.index), "children in group_b")
print("in area1 there are", len(group_c.index), "children in group_c")
print("in area1 there are", len(group_d.index), "children in group_d")
Upvotes: 0