PoteHatesBugs
PoteHatesBugs

Reputation: 5

Populate a dataframe with a value from a vector when the value of another meets a criteria

I'm working with a phenology model and the method I'm currently using relies on determining the RMSE using different combinations of minimum temperatures and degree days required.

Basically I have this:

Title:DDcum

Day min32   min33   min34
1   0.7904  0.7488  0.7072
2   1.6224  1.5392  1.456
3   2.47104 2.34624 2.22144
4   3.31968 3.15328 2.98688
5   4.16832 3.96032 3.75232
6   5.00864 4.75904 4.50944
7   5.61184 5.32064 5.02944
8   6.0112  5.6784  5.3456
9   6.36064 5.98624 5.61184
10  6.64768 6.23168 5.81568
11  6.99296 6.53536 6.07776

Accumulated heat units at arbitrary minima (but for hundreds of days and minima up to 50).

I want to populate a new data frame similar to the following:

Title:dpi

DDreq   days32  days33  days34
0       1       1       1
1       2       2       2
2       3       3       3
3       4       4       4
4       5       6       6
5       6       7       7
6       8       10      11

The column DDreq is the desired value (in this case, minimum degree day accumulation) and the entries in days32-34 are the day on which the cumulative sum of the corresponding column in DDcum exceeded the value of DDreq.

For clarification: the first entry in dpi$days32 is 1 because the value of DDcum$min32 was > 0 on day 1. The final entry in dpi$days34 is 11 because the value of DDcum$min34 didn't exceed 6 until day 11.

The only way I can think to do this is to subset DDcum for each minima and DDreq combination and use the min function dozens of times.

I have done this previously in Excel using the lookup function, and have found several similar strategies in R but nothing that quite does what I need. I'm sure excel converts have asked this question before, but I would appreciate any help.

Upvotes: 0

Views: 116

Answers (2)

alexis_laz
alexis_laz

Reputation: 13122

A way that looks valid with the sample data (I guess you have a mistake in "days34"?):

sapply(DDcum[-1], function(x) findInterval(DDreq, x) + 1)
#     min32 min33 min34
#[1,]     1     1     1
#[2,]     2     2     2
#[3,]     3     3     3
#[4,]     4     4     5
#[5,]     5     6     6
#[6,]     6     7     7
#[7,]     8    10    11

Where

DDcum = structure(list(Day = 1:11, min32 = c(0.7904, 1.6224, 2.47104, 
3.31968, 4.16832, 5.00864, 5.61184, 6.0112, 6.36064, 6.64768, 
6.99296), min33 = c(0.7488, 1.5392, 2.34624, 3.15328, 3.96032, 
4.75904, 5.32064, 5.6784, 5.98624, 6.23168, 6.53536), min34 = c(0.7072, 
1.456, 2.22144, 2.98688, 3.75232, 4.50944, 5.02944, 5.3456, 5.61184, 
5.81568, 6.07776)), .Names = c("Day", "min32", "min33", "min34"
), class = "data.frame", row.names = c(NA, -11L))

DDreq = 0:6 

Upvotes: 2

MrFlick
MrFlick

Reputation: 206167

You can iterate over req values and columns with

req<-0:6
dpi <- data.frame(DDreq=req, 
    t(sapply(req, function(i) {
        sapply(DDcum[-1], function(x) 
            DDcum$Day[which(x>i)[1]]
        )
    }))
)

I'm not sure what you want the behavior to be if the column never meets a required value but right now this will return NA.

Upvotes: 0

Related Questions