Feng Chen
Feng Chen

Reputation: 2253

How to fill different values in a new column based on different values of another column using dplyr?

Here is my data:

a <- data.frame(x=c('A','A','A','B','B','B'),
                y=c('Yes','No','No','Yes','No','No'),
                z=c(1,2,3,4,5,6))

I want to generate a new column this way:

  1. Group by x, so all the As will be in one group and all Bs in another
  2. For every group, if y=Yes, then keep the z value in the new column. If y=No, then using the z value with y=Yes.

So, the new data should look like this:

x    y   z   z1
A   Yes  1   1
A   No   2   1
A   No   3   1
B   Yes  4   4
B   No   5   4
B   No   6   4

I can use this way to do:

a1 <- a %>%
   filter(y=='Yes') %>%
   distinct(x,y,z)
 a2 <- a %>%
    left_join(a1,by='x') %>%...

But in this way, I have to generate a1 as an intermediate. How to do this just in one pipeline without generating a new variable like a1 in my example?

Upvotes: 0

Views: 64

Answers (1)

ist123
ist123

Reputation: 155

You could combine both pipelines and perform the same functions in one shot.

i.e...

    a <- data.frame(x=c('A','A','A','B','B','B'),
                    y=c('Yes','No','No','Yes','No','No'),
                    z=c(1,2,3,4,5,6))

    a %>% left_join(a %>% filter(y=='Yes') %>% distinct(x,y,z), by='x') %>% select(-y.y)

This results in duplicate columns tagged with .x and .y as a result of the join.

Upvotes: 1

Related Questions