Alternatives to MAPPLY in R

Question

I have a data frame with the following:

1) Store 2) DayOfWeek
3) Date 4) Sales
5) Customers
6) Open 7) Promo
8) StateHoliday 9) SchoolHoliday
10) StoreType
11) Assortment
12) CompetitionDistance 13) CompetitionOpenSinceMonth
14) CompetitionOpenSinceYear
15) Promo2
16) Promo2SinceWeek 17) Promo2SinceYear 18) PromoInterval
19) CompanyDistanceBin
20) CompetitionOpenSinceDate
21) DaysSinceCompetionOpen

I am trying to calculate the Average Sales for the Previous Quarter based on the date (basically date - 3 months). But, I need to also subset based on DayOfWeek and Promo. I have written a function and am using mapply.

quarter.store.sales.func <- function(storeId, storeDate, dayofweekvar, promotion)
{   
    storeDate = as.Date(storeDate,"%Y-%m-%d")
    EndDate = ymd(as.Date(storeDate)) + ddays(-1)
    EndDate = as.Date(storeDate,"%Y-%m-%d")
    StartDate = ymd(storeDate + months(-3))
    StartDate = as.Date(StartDate)

    quarterStoresales <- subset(saleswithstore, Date >= StartDate & Date <= EndDate & Store == storeId & DayOfWeek == dayofweekvar & Promo == promotion)
    quarterSales = 0
    salesDf <- ddply(quarterStoresales,.(Store),summarize,avgSales=mean(Sales))  

    if (nrow(salesDf)>0)
      quarterSales = as.numeric(round(salesDf$avgSales,digits=0))     

    return(quarterSales)
}

saleswithstore$QuarterSales <- mapply(quarter.store.sales.func, saleswithstore$Store, saleswithstore$Date, saleswithstore$DayOfWeek, saleswithstore$Promo)

 head(exampleset)
           Store         DayOfWeek Date               Sales           Promo
186            1                3  2013-06-05         5012            1
296            1                3  2013-04-10         4903            1
337            1                3  2013-05-29         5784            1
425            1                3  2013-05-08         5230            0
449            1                3  2013-04-03         4625            0
477            1                3  2013-03-27         6660            1

saleswithstore is a dataframe that has 1,000,000 rows. So, this solution is not workable because it performing badly and taking forever. Is there a better, more efficient way to have a specific subset on a dataframe like this and then and then take an average like I am trying to do here?

I am open to any suggestions. I admittedly am new to R.

Alternatives to MAPPLY in R

Answers (1)

Related Questions