Improve speed of r 'for loop'

Question

I'm fairly new to r and would like some help in speeding up the following 'for loop' code.

The code aims to create a percentile rank for each row, based on the ROE column, by Sector and Month

I'm replicating an excel spreadsheet in r, the spreadsheet uses the PERCENTRANK.INC function and the r code needs to exactly replicate it. I've investigated options in r to match this function, a dplyr method etc. but none seem to exactly replicate the results except for the loop below.

The crux of the problem is that the loop is taking 30 minutes to create all the percentile ranks for the dataframe (total rows in the input dataframe is around 90,000). Has anyone got any tips to speed up the below loop? I've read a number of the similar questions/answers on this site and tried a number of things, such as making adjustments to the subset statements near the top of the loop, however have only succeeded in limited improvements.

Details on the input file 'ROE_Quintiles'

It consists of historical stock ROE values, by month
Key columns are:
Merge_Var4, which is a combination of the Sector the stock belongs to and the month e.g. Automobiles & ComponentsDec-03
ROE2 is the column the code is getting the percentile rank for (by Sector and Month)
Merge_Var3 is a combination of the stock, sector it belongs to and month e.g. Ford Motor CoAutomobiles & ComponentsDec-03
ROE2_percrank is the percentile rank which is output at the end of each iteration

Thanks a lot for your help.

SMG

My r code is shown below:

# Create dataframe to append to at the end of each iteration
ROE_Quintiles3 <- data.frame("Merge_Var3" = c('Temp'), "ROE2_percrank" = c(0.5))
End <- nrow(ROE_Quintiles)
system.time({
  for(i in 1:End) {
    Row <- ROE_Quintiles[i,]
    Row_Value <- subset(Row, select=c(ROE2))
    Row_Value2 <- mean(Row_Value$ROE2) # PercentRankArgument Value

    Row_Sector_Month <- subset(Row, select=c(Merge_Var4))
    Row_Sector_Month_Values <- subset(ROE_Quintiles, Merge_Var4==Row_Sector_Month$Merge_Var4, select=c(ROE2))

    # Filter Number to values less than the row value
    NumberLessThanArgument = subset(Row_Sector_Month_Values, ROE2 < Row_Value2)

    # Filter Number to values greater than or equal to the row value
    NumberGreaterThanOrEqualArgument = subset(Row_Sector_Month_Values, ROE2 >= Row_Value2)

    # RankLower = the count of Numbers less than row value, and is used later for 
    # interpolation of ranks
    RankLower <- nrow(NumberLessThanArgument)

    # NumberLower = the largest Number < row value, used for interpolation
    NumberLower <- ifelse(RankLower==0, Row_Value2, max(NumberLessThanArgument))

    # NumberUpper = the smallest Number >= row value, used for interpolation
    NumberUpper = min(NumberGreaterThanOrEqualArgument) 

    # PercentRankArgumentRank =  the rank of row value over the Number table, which is 
    # just RankLower + 1. This is the same rank as NumberUpper in the Number table itself.
    PercentRankArgumentRank = RankLower + 1

    # InterpolationFraction = fraction that row value is from NumberLower to NumberUpper  
    InterpolationFraction <- ifelse(RankLower==0, 0, (Row_Value2 - NumberLower)/(NumberUpper - NumberLower))

    # Calculate the interpolated rank
    RankInterpolated = max(1, RankLower + InterpolationFraction * (PercentRankArgumentRank - RankLower))

    # Get the count of Numbers
    NumberCount = nrow(Row_Sector_Month_Values)

    # Final PercentRank is (RankInterpolated - 1)/(NumberCount - 1)
    PercentRankOutput = (RankInterpolated - 1)/(NumberCount - 1)

    # Append to create main dataframe
    Row_Output <- subset(Row, select=c(Merge_Var3))
    Row_Output$ROE2_percrank <- PercentRankOutput
    ROE_Quintiles3 <- rbind(ROE_Quintiles3, Row_Output)
  }
})
ROE_Quintiles3 <- subset(ROE_Quintiles3, Merge_Var3 != 'Temp')

Improve speed of r 'for loop'

Answers (1)

Related Questions

Improve speed of r &#39;for loop&#39;

Answers (1)

Related Questions

Improve speed of r 'for loop'