scrollex
scrollex

Reputation: 2693

How to divide contents of one column by different values, conditional on contents of a second column?

I've got a data frame that looks like something along these lines:

Day        Salesperson        Value
====      ============        =====
Monday       John              40
Monday       Sarah             50
Tuesday      John              60
Tuesday      Sarah             30
Wednesday    John              50
Wednesday    Sarah             40

I want to divide the value for each salesperson by the number of times that each of the days of the week has occurred. So: There have been 3 Monday, 3 Tuesdays, and 2 Wednesdays — I don't have this information digitally, but can create a vector along the lines of

c(3, 3, 2)

How can I conditionally divide the Value column based on the number of times each day occurs?

I've found an inelegant solution, which entails copying the Day column to a temp column, replacing each of the names of the week in the new column with the number of times each day occurs using

df$temp <- sub("Monday, 3, df$temp)

but doing this seems kinda clunky. Is there a neat way to do this?

Upvotes: 0

Views: 672

Answers (2)

eivicent
eivicent

Reputation: 273

You can use the library dplyr to merge your data frame with the frequency of each day.

df <- data.frame(
  Day=c("Monday","Monday","Tuesday","Tuesday","Wednesday","Wednesday"),
  Salesperson=c("John","Sarah","John","Sarah","John","Sarah"),
  Value=c(40,50,60,30,50,40), stringsAsFactors=F)

aux <- data.frame(
  Day=c("Monday","Tuesday","Wednesday"),
  freq=c(3,3,2)
)

output <- df %>% left_join(aux, by="Day") %>% mutate(Value2=Value/n)

To create this auxiliary table with the count of days that appear in your original data instead of doing it manually. You could use:

aux <- df %>% group_by(Day) %>% summarise(n=n())

> output
        Day Salesperson Value n Value2
1    Monday        John    40 2     20
2    Monday       Sarah    50 2     25
3   Tuesday        John    60 2     30
4   Tuesday       Sarah    30 2     15
5 Wednesday        John    50 2     25
6 Wednesday       Sarah    40 2     20

If you want to substitute the actual valuecolumn, then use mutate(Value=Value/n) and to remove the additional columns, you can add a select(-n)

output <- df %>% left_join(aux, by="Day") %>% mutate(Value=Value/n) %>% select(-n)

Upvotes: 2

Frank
Frank

Reputation: 66819

Suppose your auxiliary data is in another data.frame:

        Day N_Day
1    Monday     3
2   Tuesday     3
3 Wednesday     2

The simplest way would be to merge:

DF_new        <- merge(DF, DF2, by="Day")
DF_new$newcol <- DF_new$Value / DF_new$N_Day

which gives

        Day Salesperson Value N_Day   newcol
1    Monday        John    40     3 13.33333
2    Monday       Sarah    50     3 16.66667
3   Tuesday        John    60     3 20.00000
4   Tuesday       Sarah    30     3 10.00000
5 Wednesday        John    50     2 25.00000
6 Wednesday       Sarah    40     2 20.00000

The mergeless shortcut is

DF$newcol <- DF$Value / DF2$N_Day[match(DF$Day, DF2$Day)]

Data:

DF <- structure(list(Day = structure(c(1L, 1L, 2L, 2L, 3L, 3L), .Label = 
c("Monday", 
"Tuesday", "Wednesday"), class = "factor"), Salesperson = structure(c(1L, 
2L, 1L, 2L, 1L, 2L), .Label = c("John", "Sarah"), class = "factor"), 
Value = c(40L, 50L, 60L, 30L, 50L, 40L)), .Names = c("Day", 
"Salesperson", "Value"), class = "data.frame", row.names = c(NA, 
-6L))

DF2 <- structure(list(Day = structure(1:3, .Label = c("Monday", "Tuesday", 
"Wednesday"), class = "factor"), N_Day = c(3, 3, 2)), .Names = c("Day", 
"N_Day"), row.names = c(NA, -3L), class = "data.frame")

Upvotes: 2

Related Questions