porteclefs
porteclefs

Reputation: 507

multiple condition grouping and counting pandas

I am going to try to express this problem in the most general way possible. Suppose I have a pandas dataframe with multiple columns ['A', 'B', 'C', 'D'].

For each unique value in 'A', I need to get the following ratio: the number of times 'B' == x, divided by the number of times 'B' == y, when 'C' == q OR p...

I'm sorry, but I don't know how to express this pythonically.

Sample data:

df = pd.DataFrame({'A': ['foo', 'zar', 'zar', 'bar', 'foo', 'bar','foo', 'bar', 'tar', 'foo', 'foo'],
'B': ['one', 'two', 'four', 'three', 'one', 'two', 'three','two', 'two', 'one', 'three'],
'C': np.random.randn(11),'D': np.random.randn(11)})`

I need something like the following. For each unique value i in 'A', I need the ratio of the number of times 'B' == 'one' over the number of times 'B' == 'two' when 'C' > 2.

So, an output would be something like: foo = 0.75

Upvotes: 1

Views: 73

Answers (1)

Michaelpanicci
Michaelpanicci

Reputation: 196

I multiplied np.random.randn(11) by 10 so that the C > 2 constraint can exist, since np.random.randn(11) returns decimal values. The following code will produce what you want in steps. Feel free to condense. Also, it was ambiguous whether the C > 2 constraint applies to both the numerator and denominator or just the denominator. I assumed just the denominator. If you need it to be applied to the numerator, add the [df.C > 2] constraint to the n variable as well. Also, the ratios returned for this current df are inf if divide by 0 occurs and nan if 0 divided by 0 occurs.

for i in df.A.unique():
    #print unique value
    print(f"Unique Val: {i}")

    #print numerator
    print("Numerator:")
    n = (df[df.A == i].B == 'one').sum()
    print(n)

    #print denominator
    print("Denominator:")
    d = (df[df.A == i][df.C > 2].B == 'two').sum()
    print(d)

    #print ratio
    print("Ratio:")
    r = n/d
    print(r, "\n")

Upvotes: 1

Related Questions