user2662565
user2662565

Reputation: 529

Running count of the number of days that satisfy a condition

This is a very difficult question to phrase properly, and I have already visited numerous posts on SO without finding a solution for this problem.

I have 2 dataframes with identical dimensions, (in reality they are each 983 27, but in this example they are 10 x 5).

df1 <- data.frame(v=runif(10),w=runif(10),x=runif(10),y=runif(10),z=runif(10))
df2 <- data.frame(v=runif(10),w=runif(10),x=runif(10),y=runif(10),z=runif(10))

df1
            v          w          x           y          z
1  0.47183652 0.22260903 0.22871379 0.549137695 0.19310086
2  0.26030258 0.33811230 0.66651066 0.432569755 0.88964481
3  0.99671428 0.87778858 0.76554728 0.486628372 0.28298038
4  0.51320543 0.62279625 0.52370766 0.003457935 0.20230251
5  0.09182823 0.88205170 0.43630438 0.308291706 0.03875207
6  0.29005832 0.96372511 0.65346596 0.411204978 0.22091272
7  0.76790152 0.47633721 0.79825487 0.329127652 0.48165651
8  0.85939833 0.70695256 0.05128899 0.631819822 0.26584177
9  0.14903837 0.09196876 0.56711615 0.443217700 0.33934426
10 0.79928314 0.15035157 0.82297350 0.203435449 0.21088680

df2
           v         w          x          y          z
1  0.9733651 0.1407513 0.32073105 0.18886833 0.76234111
2  0.9009754 0.1303898 0.48968741 0.45347721 0.78475371
3  0.8460530 0.6597701 0.20024460 0.59079853 0.63302668
4  0.9879135 0.2348028 0.73954442 0.70185877 0.23834780
5  0.5748540 0.4139660 0.79869841 0.02760473 0.99871034
6  0.9164362 0.7166881 0.25280647 0.35890724 0.03500226
7  0.1302808 0.3734517 0.25132321 0.67417021 0.57109357
8  0.1114569 0.7319093 0.57513770 0.11055742 0.86348983
9  0.6596877 0.5261662 0.50796080 0.95685045 0.17689039
10 0.8299933 0.8244658 0.04408135 0.33849748 0.96904940

I need to iterate through each column, and for each day T, count the number of days on which (T-1,T-2,T-3...T-n) < T, for both dataframes simultaneously, then compute the % frequency.

The steps would be:

  1. for example on Day T=2, consider df1[2,1] (which is 0.26030258) and go back and flag any days prior to T=2 that are less than 0.26030258. Since we are using T=2 as an example, the only prior observation is df1[1,1]. If df1[1,1] < df1[2,1] flag this day as 1 IF

  2. df2[1,1] is ALSO less than df2[2,1]

  3. Finally, still for example T=2, sum the number of 1s and divide by the number of observations to generate a frequency for T=2.

Again, I need to do this for 983 dates, and across 27 columns. I have tried various methods using rollify, as well as various functions wrapped in sapply, but it is challenging given the dynamic width of the countif criterion, let alone doing this across 2 DFs at the same time.

Upvotes: 0

Views: 54

Answers (2)

Roman
Roman

Reputation: 4989

For the first df:

df1_result <- matrix(nrow = 10, ncol = 5)

for(j in 1:ncol(df1)){
    for(i in 1:nrow(df1)){
        df1_result[i, j] <- df1 %>%
            filter(df1[ ,j] < df1[i, j] & row_number() < i) %>%
            nrow()
    }
}

Resulting in:

> df1_result
      [,1] [,2] [,3] [,4] [,5]
 [1,]    0    0    0    0    0
 [2,]    1    0    1    0    1
 [3,]    0    1    0    2    0
 [4,]    2    1    1    2    0
 [5,]    4    3    3    2    1
 [6,]    2    2    3    5    2
 [7,]    4    4    6    0    2
 [8,]    4    7    2    1    2
 [9,]    0    3    5    3    5
[10,]    6    7    5    8    9

Will gladly expand when you respond to comments.

Data

set.seed(1701)
df1 <- data.frame(v=runif(10),w=runif(10),x=runif(10),y=runif(10),z=runif(10))
df2 <- data.frame(v=runif(10),w=runif(10),x=runif(10),y=runif(10),z=runif(10))

> df1
             v          w         x          y         z
1  0.127393428 0.85600486 0.4791849 0.55089910 0.9201376
2  0.766723202 0.02407293 0.8265008 0.35612092 0.9279873
3  0.054421675 0.51942589 0.1076198 0.80230714 0.5993939
4  0.561384595 0.20590965 0.2213454 0.73043828 0.1135139
5  0.937597936 0.71206404 0.6717478 0.72341749 0.2472984
6  0.296445079 0.27272126 0.5053170 0.98789408 0.4514940
7  0.665117463 0.66765977 0.8849426 0.04751297 0.3097986
8  0.652215607 0.94837341 0.3560469 0.06630861 0.2608917
9  0.002313313 0.46710461 0.5732139 0.55040341 0.5375610
10 0.661490602 0.84157353 0.5091688 0.95719901 0.9608329

Upvotes: 1

Gregor Thomas
Gregor Thomas

Reputation: 145765

I think something like this:

m1 = as.matrix(df1)
m2 = as.matrix(df2)
results = matrix(nrow = nrow(df1) - 1, ncol = ncol(df1))
colnames(results) = names(df1)

for(i in 2:nrow(df1)) {
  results[i - 1, ] = rowSums(t(m1[1:(i - 1), , drop = FALSE]) < m1[i, ] & t(m2[1:(i - 1), , drop = FALSE]) < m2[i, ]) / (i - 1)
}

results
 #              v   w         x         y         z
 # [1,] 0.0000000 1.0 1.0000000 0.0000000 0.0000000
 # [2,] 0.5000000 0.0 0.0000000 0.0000000 1.0000000
 # [3,] 0.0000000 0.0 0.3333333 0.6666667 0.6666667
 # [4,] 0.2500000 0.0 0.0000000 0.0000000 0.0000000
 # [5,] 0.0000000 0.4 0.4000000 0.6000000 0.6000000
 # [6,] 0.0000000 0.0 0.3333333 0.0000000 0.1666667
 # [7,] 0.0000000 0.0 0.4285714 0.0000000 0.2857143
 # [8,] 0.1250000 0.5 0.6250000 0.5000000 0.8750000
 # [9,] 0.2222222 0.0 0.4444444 0.4444444 0.2222222

There's a bit of guesswork since you haven't responded yet to comments, but this should be easily modifiable in case my assumptions are wrong.

Upvotes: 1

Related Questions