jhyu93
jhyu93

Reputation: 3

How to avoid for loop in R when altering a column

I'm working with a data frame that looks very similar to the below:

Image here, unfortunately don't have enough reputation yet

This is a 600,000 row data frame. What I want to do is for every repeated instance within the same date, I'd like to divide the cost by total number of repeated instances. I would also like to consider only those falling under the "Sales" tactic.

So for example, in 1/1/16, there are 2 "Help Packages" that are also under the "Sales" tactic. Because there are 2 instances within the same date, I'd like to divide the cost of each by 2 (so the cost would come out as $5 for each).

This is the code I have:

for(i in 1:length(dfExample$Date)){
  if(dfExample$Tactic) == "Sales"){
    list = agrep(dfExample$Package[i], dfExample$Package)
    for(i in list){
      date_repeats = agrep(i, dfExample$Date)
      dfExample$Cost[date_repeats] = dfExample$Package[i]/length(date_repeats)
      }
  }
}

It is incredibly inefficient and slow. I know there's got to be a better way to achieve this. Any help would be much appreciated. Thank you!

Upvotes: 0

Views: 54

Answers (2)

jogo
jogo

Reputation: 12559

ave() can give a solution without additional packages:

with(dfExample, Cost / ave(Cost, Date, Package, Tactic, FUN=length))

Upvotes: 3

Gregor Thomas
Gregor Thomas

Reputation: 145870

Using dplyr:

library(dplyr)
dfExample %>%
    group_by(Date, Package, Tactic) %>%
    mutate(Cost = Cost / n())

I'm a little unclear what you mean by "instance". This (pretty clearly) groups by Date, Package, and Tactic, and so will consider each unique combination of those columns as a grouper. If you don't include Tactic in the definition of an "instance", then you can remove it to group only by Date and Package.

Upvotes: 1

Related Questions