Reputation: 507
I am going to try to express this problem in the most general way possible. Suppose I have a pandas dataframe with multiple columns ['A', 'B', 'C', 'D']
.
For each unique value in 'A', I need to get the following ratio: the number of times 'B' == x
, divided by the number of times 'B' == y
, when 'C' == q OR p
...
I'm sorry, but I don't know how to express this pythonically.
Sample data:
df = pd.DataFrame({'A': ['foo', 'zar', 'zar', 'bar', 'foo', 'bar','foo', 'bar', 'tar', 'foo', 'foo'],
'B': ['one', 'two', 'four', 'three', 'one', 'two', 'three','two', 'two', 'one', 'three'],
'C': np.random.randn(11),'D': np.random.randn(11)})`
I need something like the following. For each unique value i
in 'A'
, I need the ratio of the number of times 'B' == 'one'
over the number of times 'B' == 'two'
when 'C' > 2
.
So, an output would be something like:
foo = 0.75
Upvotes: 1
Views: 73
Reputation: 196
I multiplied np.random.randn(11)
by 10 so that the C > 2
constraint can exist, since np.random.randn(11)
returns decimal values. The following code will produce what you want in steps. Feel free to condense. Also, it was ambiguous whether the C > 2
constraint applies to both the numerator and denominator or just the denominator. I assumed just the denominator. If you need it to be applied to the numerator, add the [df.C > 2]
constraint to the n
variable as well. Also, the ratios returned for this current df are inf
if divide by 0 occurs and nan
if 0 divided by 0 occurs.
for i in df.A.unique():
#print unique value
print(f"Unique Val: {i}")
#print numerator
print("Numerator:")
n = (df[df.A == i].B == 'one').sum()
print(n)
#print denominator
print("Denominator:")
d = (df[df.A == i][df.C > 2].B == 'two').sum()
print(d)
#print ratio
print("Ratio:")
r = n/d
print(r, "\n")
Upvotes: 1