BillyJean
BillyJean

Reputation: 1577

substitute value in dataframe based on conditional

I have the following data set

library(dplyr)


df<- data.frame(c("a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b"),
                c(1,    1,   2,   2,   2,   3,   1,   2,   2,   2,   3,   3),
                c(25,   75,  20,  40,  60,  50,  20,  10,  20,  30,  40,  60))
colnames(df)<-c("name", "year", "val")

This we summarize by grouping df by name and year and then find the average and number of these entries

asd <- (df %>%
         group_by(name,year) %>%
         summarize(average = mean(val), `ave_number` = n()))

This gives the following desired output

    name  year average ave_number
  <fctr> <dbl>   <dbl>      <int>
1      a     1      50          2
2      a     2      40          3
3      a     3      50          1
4      b     1      20          1
5      b     2      20          3
6      b     3      50          2

Now, all entries of asd$average where asd$ave_number<2 I would like to substitute according to the following array based on year

replacer<- data.frame(c(1,2,3),
                c(100,200,300))
colnames(replacer)<-c("year", "average")

In other words, I would like to end up with

    name  year average ave_number
  <fctr> <dbl>   <dbl>      <int>
1      a     1      50          2
2      a     2      40          3
3      a     3      300         1 #substituted
4      b     1      100         1 #substituted 
5      b     2      20          3
6      b     3      50          2

Is there a way to achieve this with dplyr? I guess I have to use the %>%-operator, something like this (not working code)

asd %>%
  group_by(name, year) %>% 
  summarize(average = ifelse(n() < 2, #SOMETHING#, mean(val)))

Upvotes: 0

Views: 67

Answers (2)

Phil
Phil

Reputation: 8107

Here's what I would do:

colnames(replacer) <- c("year", "average_replacer") #To avoid duplicate of variable name
asd <- left_join(asd, replacer, by = "year") %>% 
mutate(average = ifelse(ave_number < 2, average_replacer, average)) %>%
select(-average_replacer)

  name  year average ave_number
<fctr> <dbl>   <dbl>      <int>
1      a     1      50          2
2      a     2      40          3
3      a     3     300          1
4      b     1     100          1
5      b     2      20          3
6      b     3      50          2

Regarding the following:

I guess I have to use the %>%-operator

You don't ever have to use the pipe operator. It is there for convenience because you can string (or "pipe") functions one after another, as you would with a train of thought. It's kind of like having a flow in your code.

Upvotes: 1

Jake Kaupp
Jake Kaupp

Reputation: 8072

You can do this easily by using a named vector of replacement values by year instead of a data frame. If you're set on a data frame, you'd be using joins.

replacer <- setNames(c(100,200,300),c(1,2,3))

asd <- df %>%
          group_by(name,year) %>%
          summarize(average = mean(val), 
                    ave_number = n()) %>% 
  mutate(average = if_else(ave_number < 2, replacer[year], average))


Source: local data frame [6 x 4]
Groups: name [2]

    name  year average ave_number
  <fctr> <dbl>   <dbl>      <int>
1      a     1      50          2
2      a     2      40          3
3      a     3     300          1
4      b     1     100          1
5      b     2      20          3
6      b     3      50          2

Upvotes: 1

Related Questions