nitpicks
nitpicks

Reputation: 31

Unable to figure out nested for loops in R

I am unable to figure out how to use nested for loops in R for solving my problem. Here's a miniature version of what I'm trying to solve:

I have two files, test1 and test2 which look like this:

head(test1)

      Date Settlement
2008-08-28     138.29
2008-08-29     135.34
2008-09-01     135.23
2008-09-02     123.36
2008-09-03     126.41
2008-09-04     128.68
2008-09-05     123.70
2008-09-08     124.60
2008-09-09     122.33
2008-09-10     120.85
2008-09-11     120.15
2008-09-12     121.17
2008-09-15     118.97
2008-09-16     114.90
2008-09-17     115.78
2008-09-18     115.60
2008-09-19     115.90
2008-09-22     120.49
2008-09-23     124.10

And here is test2:

test2

X1         X2         X3
2008-08-31 2008-09-05 2008-09-11
2008-09-05 2008-09-11 2008-09-14
2008-09-11 2008-09-14 2008-09-18
2008-09-14 2008-09-18 2009-09-22

The logic that I need to put in is:

  1. Select Dates [1,1] and [1,2] from test2
  2. Find all Settlement Prices between those 2 dates in test1
  3. Get average of those prices, place it in [1,1] of a new dataframe.
  4. Repeat by increasing columns, and then rows in pt1.

The end-result of this would look like this:

X1          X2
128.42  122.87
122.87  120.66
120.66  116.55
116.55  115.75

So, the 1st value in X1 is an average of Settlement prices between 31-Aug-08 (including) and 5-Sep-08 (excluding), and the 1st value in X2 is an average of Settlement prices between 5-Sep-08 (including) and 11-Sep-08 (excluding), and so on for the rows below.

Here's my code that works (if I pass it fixed dates from test2 as given below):

temp1 <- test1 %>%
  group_by(Date >= test2$X1[1] & Date < test2$X2[1]) %>%
  summarise(AvgPrice2 = mean(Settlement, na.rm = T))

temp1 <- filter(temp1, temp1[,1]==TRUE)

However, no matter what I try (over last 3 days !) I cannot figure out how to put this into a for loop. Even tried rollapply, sapply...not able to get anything to work. The code need not be time efficient, I just need to automate this process.

I have been working with R for sometime, but clearly this is a problem for advanced users...Would deeply appreciate any help on this.

Many thanks in advance.

Upvotes: 2

Views: 85

Answers (4)

nitpicks
nitpicks

Reputation: 31

Thanks a lot for all the answers, I tried all of them, but none seemed to fit my needs given that the files above were a miniaturized version of actual files - so coding by column names / splitting data manually into rows didn't seem like a good option for me.

But I finally figured out what'll work nicely in this case:

  library(lubridate)
  Makingrows <- function(test1, test2, j){
     res<<- NULL
     m1 = nrow(test2)

     for(i in 1:m1){
        d1 <- ymd(test2[i,j])
        d2 <- ymd(test2[i,j+1])
        X1 <- filter(test1, Date < d2 & Date >= d1)
        res[i] <- mean(X1$Settlement, na.rm = T)
  }
  return(res)
}

 mcol1 <- ncol(test2)-1
 finalres <- lapply(1:mcol1, function(x) Makingrows(test1, test2, x))
 finalres <- as.data.frame(finalres)

And yes, I was also getting the last value as 119.02...and I realized that by mistake I put the year as 2009 in the last cell in test2 file. Due to this, the code was picking up all the values till the end.

Thanks a lot everyone. I hope you'll agree with me as I mark this as the answer to my question.

Upvotes: 0

CPak
CPak

Reputation: 13581

Your data

Ensuring dates are Dates

library(lubridate)

test1 = data.frame(Date = ymd(c("2008-08-28", "2008-08-29", "2008-09-01", "2008-09-02", "2008-09-03", "2008-09-04", "2008-09-05", "2008-09-08", "2008-09-09", "2008-09-10", "2008-09-11", "2008-09-12", "2008-09-15", "2008-09-16", "2008-09-17", "2008-09-18", "2008-09-19", "2008-09-22", "2008-09-23")), 
            Settlement = c(138.29, 135.34, 135.23, 123.36, 126.41, 128.68, 123.70, 124.60, 122.33, 120.85, 120.15, 121.17, 118.97, 114.90, 115.78, 115.60, 115.90, 120.49, 124.10))

test2 = data.frame(d1=ymd(c("2008-08-31", "2008-09-05", "2008-09-11", "2008-09-14")),
            d2=ymd(c("2008-09-05", "2008-09-11", "2008-09-14", "2008-09-18")),
            d3=ymd(c("2008-09-11", "2008-09-14", "2008-09-18", "2009-09-22")))

tidyverse solution

library(tidyverse)
result <- map_df(1:nrow(test2), ~data.frame(X1=(filter(test1, Date >= test2$d1[.x] & Date < test2$d2[.x]) %>% summarise(m=mean(Settlement)))$m,
                                            X2=(filter(test1, Date >= test2$d2[.x] & Date < test2$d3[.x]) %>% summarise(m=mean(Settlement)))$m)) 

Output

      X1       X2
1 128.42 122.8700
2 122.87 120.6600
3 120.66 116.5500
4 116.55 119.0225

Upvotes: 0

Bruno Zamengo
Bruno Zamengo

Reputation: 860

I would use an SQL-like approach through the sqldf package (which lets you to apply SQL sintax to your data.frames

ds = data.frame(Date = c("2008-08-28", "2008-08-29", "2008-09-01", "2008-09-02", "2008-09-03", "2008-09-04", "2008-09-05", "2008-09-08", "2008-09-09", "2008-09-10", "2008-09-11", "2008-09-12", "2008-09-15", "2008-09-16", "2008-09-17", "2008-09-18", "2008-09-19", "2008-09-22", "2008-09-23"), 
                Settlement = c(138.29, 135.34, 135.23, 123.36, 126.41, 128.68, 123.70, 124.60, 122.33, 120.85, 120.15, 121.17, 118.97, 114.90, 115.78, 115.60, 115.90, 120.49, 124.10))

dr = data.frame(d1=c("2008-08-31", "2008-09-05", "2008-09-11", "2008-09-14"),
                d2=c("2008-09-05", "2008-09-11", "2008-09-14", "2008-09-18"),
                d3=c("2008-09-11", "2008-09-14", "2008-09-18", "2009-09-22"))
# add a variable which I will use to identify the rows
dr$g = 1:NROW(dr);


library(sqldf);
output = sqldf("SELECT dr.g, AVG(s1.Settlement) AS X1, AVG(s2.Settlement) AS X2
                FROM dr 
                    JOIN ds AS s1 ON dr.d1 <= s1.Date AND s1.Date < dr.d2
                    JOIN ds AS s2 ON dr.d2 <= s2.Date AND s2.Date < dr.d3
                GROUP BY dr.g");

I found the suggested package in this post. In the same post another user suggested the use of the data.table package but I don't feel as confident on data.table sintax as the SQL one :)

The documentation of sqldf and some usage example can be found on GitHub project page

Upvotes: 1

Rui Barradas
Rui Barradas

Reputation: 76402

I'm not sure I got it, one of my results is different from the one in your wanted output. First, make sure the dates are of class Date.

test1$Date <- as.Date(test1$Date)
test2$X1 <- as.Date(test2$X1)
test2$X2 <- as.Date(test2$X2)
test2$X3 <- as.Date(test2$X3)

Now, for the computations you've described.

res1 <- numeric(nrow(test2))
res2 <- numeric(nrow(test2))
for(i in seq_len(nrow(test2))){
    inx <- test2$X1[i] <= test1$Date & test1$Date < test2$X2[i]
    res1[i] <- mean(test1$Settlement[inx])
    inx <- test2$X2[i] <= test1$Date & test1$Date < test2$X3[i]
    res2[i] <- mean(test1$Settlement[inx])
}

result <- data.frame(X1 = res1, X2 = res2)
result
      X1       X2
1 128.42 122.8700
2 122.87 120.6600
3 120.66 116.5500
4 116.55 119.0225

The value that is different is the very last one, result$X2[4]. Your output is 115.75 and here it's 119.0225.

Upvotes: 0

Related Questions