Chrestomanci
Chrestomanci

Reputation: 221

Count conditional occurrence of values in multiple cols of one df using another df

I have 2 dataframes; the first df_data looks like this:

A   B   C
-----------------
0   0   1
1   4   1
1   10  1
2   8   1
2   9   1
3   7   1
3   4   0
4   8   1
5   8   1
6   7   1
9   17  1
9   13  1

...

The desired output (df_output) looks like this:

Day Zt  Zr
-----------------
1   2   0
2   4   0
3   5   1
4   6   1
5   6   0
6   7   0
7   7   0
8   5   0
9   5   0
10  5   0
11  5   0
12  6   0

...

Basically I created a new dataframe (df_output) that only has the Days filled, which represents each day of a given month. I have been trying to figure out how to count occurrences of the values in df_data based on the following conditionals (these conditionals don't actually work since it is pulling from different df's):

df_output['Zt'] = (df_data['A'] <= df_output['Day']) & (df_data['B'] >= df_output['Day']) & (df_data['C'] == 1).count()

df_output['Zr'] = (df_data['A'] <= df_output['Day']) & (df_data['B'] >= df_output['Day']) & (df_data['C'] == 0).count()

The logic for calculating Zt and Zr in more readable format is (if it were a function):

def countZt():
   for each day in df_output['Day']:
   Zt = 0
      for each row in df_data:
         if (df_data['A'] <= day) and (df_data['B'] >= day) and (df_data['C'] == 1):
            Zt = Zt + 1
   return Zt

def countZr():
   for each day in df_output['Day']:
   Zr = 0
      for each row in df_data:
         if (df_data['A'] <= day) and (df_data['B'] >= day) and (df_data['C'] == 0):
            Zr = Zr + 1
   return Zr

You can see that the only difference between Zt and Zr is that one is counting those where the Bool (C) was True, and the other is counting where it was False.

The resources I've discovered on this subject only guide on how to make conditionals within the same dataframe, but there is a scarcity/absence of resources that show how to use the value of one dataframe to count the other. Any help pointing me in the right direction would be greatly appreciated! Thank you!

Upvotes: 0

Views: 31

Answers (1)

Onyambu
Onyambu

Reputation: 79208

we could reshape before comparison as shown below:

cond = (df.A.values<=df_output.Day[:,None])&(df.B.values>=df_output.Day[:,None])
df_output.assign(Zt = cond.dot(df.C),Zr = cond.dot(1-df.C))
    Day  Zt  Zr
0     1   2   0
1     2   4   0
2     3   5   1
3     4   6   1
4     5   6   0
5     6   7   0
6     7   7   0
7     8   5   0
8     9   4   0
9    10   3   0
10   11   2   0
11   12   2   0

Upvotes: 1

Related Questions