user3612324
user3612324

Reputation: 57

Creating new columns based on conditions of the existing columns

I have a data.table which contains the following values

structure(list(Item = c("10040", "10040", "10110", "10190", "10190", 
"12020", "12970", "7010040"), Customer = c("CLUB RETAILER - 106", 
"WHOLESALER - 112", "NATIONAL RETAILER - 102", "MID WEST 1 - 120", 
"WHOLESALER - 112", "WHOLESALER - 112", "WHOLESALER - 112", "DIAMLER BENZ - 108"
), DemandID = c("NetBaseForecast", "NetBaseForecast", "NetBaseForecast", 
"NetBaseForecast", "NetBaseForecast", "NetBaseForecast", "NetBaseForecast", 
"NetBaseForecast"), Forecasts = c(5, 158, 212, 12, 3, 3, 3, 2776
), RTF = c(1, 98, 139, 8, 0, 0, 0, 2356)), row.names = c(NA, 
-8L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x00000000025c1ef0>,     sorted = c("Item", "Customer"))

I need to create 2 new columns based on the forecast and rtf columns based on 2 conditions of each row for the Forecasts and RTF columns. I am using a data.table

  1. If Forecast > RTF
    • New_F = Forecast - RTF
    • Com_F = RTF
  2. If Forecast < RTF
    • New_F = 0
    • Com_F = Forecast

I am currently using a for loop like this

for(i in 1:nrow(temp1NoOrders)){
  if (temp1NoOrders$Forecasts[i] >= temp1NoOrders$RTF[i]){
   temp1NoOrders$Com_F[i] <- temp1NoOrders$RTF[i]
   temp1NoOrders$New_F[i] <- temp1NoOrders$Forecasts[i] - temp1NoOrders$RTF[i]
 }
  else if (temp1NoOrders$Forecasts[i] < temp1NoOrders$RTF[i]){
   temp1NoOrders$Com_F[i] <- temp1NoOrders$Forecast[i]
   temp1NoOrders$New_F[i] <- 0
 }
}

Is there a faster way of doing this with data.table? I am assuming that if I use a 100k row dataset, this might become slower because of the loop and the if-else statements.

Upvotes: 1

Views: 59

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388797

You don't need to do this line by line. Use fifelse in data.table to check for conditions.

library(data.table)

temp1NoOrders[, c("New_F","Com_F") := 
              .(fifelse(Forecasts > RTF, Forecasts - RTF, 0), 
                fifelse(Forecasts > RTF, RTF, Forecasts))] 

temp1NoOrders
#      Item                Customer        DemandID Forecasts  RTF New_F Com_F
#1:   10040     CLUB RETAILER - 106 NetBaseForecast         5    1     4     1
#2:   10040        WHOLESALER - 112 NetBaseForecast       158   98    60    98
#3:   10110 NATIONAL RETAILER - 102 NetBaseForecast       212  139    73   139
#4:   10190        MID WEST 1 - 120 NetBaseForecast        12    8     4     8
#5:   10190        WHOLESALER - 112 NetBaseForecast         3    0     3     0
#6:   12020        WHOLESALER - 112 NetBaseForecast         3    0     3     0
#7:   12970        WHOLESALER - 112 NetBaseForecast         3    0     3     0
#8: 7010040      DIAMLER BENZ - 108 NetBaseForecast      2776 2356   420  2356

fifelse is easy to understand but more efficient approach would be to use pmax and pmin :

temp1NoOrders[, c("New_F","Com_F") := .(pmax(Forecasts - RTF, 0), 
                                        pmin(Forecasts, RTF))]

Upvotes: 5

Related Questions