aMarsh
aMarsh

Reputation: 33

R - Vectorizing for loops

I would like to know if and how I could make my code more efficient by using vectorized functions instead of for loops.

I am working on a dataset with around 1.6 million observations. I want to adjust the prices for inflation so I need to match the month of the observation with the month of the corresponding CPI index. I have a main data frame (the one with 1.6 million observations) and a data frame with the CPI index I need (this only has 12 observations, one for each month in the year my analysis is taking place).

Here is how I tried to "match" each observation with its corresponding CPI index:

`for(i in 1:nrow(large.data.frame)){
  for(j in 1:nrow(CPI)){
    if(months(large.data.frame[i,"Date"])==months(CPI[j,"Date"])){
      CPImatch[i] <- CPI[j,2]
    }
    else next
  }
 }`

NOTE: CPImatch is a separate data frame I was going to use to place the matched values in and then cbind it with my initial data frame. As well, I know there is probably a better way to do this...

Since my code is still running, I know that this is an incredibly inefficient (and maybe even wrong) way of doing what I want to do. Is there a way of vectorizing this loop, possibly with a function from the apply family?

Any feedback is greatly appreciated!

Upvotes: 2

Views: 99

Answers (1)

Richard Telford
Richard Telford

Reputation: 9923

You code can certainly be made much faster. One simple step would be to pre-calculate the months rather than calculating it many many times. Vectorisation will make it even faster. I think the following code should work, mapping the months to CPI - difficult to test without some test data.

require(plyr)
CPImatch <- mapvalues(months(large.data.frame$Date), from  = months(CPI$Date), to = CPI[,2])

Upvotes: 1

Related Questions