Mike Lawrence
Mike Lawrence

Reputation: 1701

Using the value in one column to specify from which row to retrieve a value for a new column

I'm looking for an automated way of converting this:

dat = tribble(
    ~a, ~b, ~c
    , 'x', 1, 'y'
    , 'y', 2, NA
    , 'q', 4, NA
    , 'z', 3, 'q'
)

to:

tribble(
    ~a, ~b, ~d
    , 'x', 1, 2
    , 'z', 3, 4
)

So, the column c in dat encodes which row in dat to look at to grab a value for a new column d, and if c is NA, toss that row from the output. Any tips?

Upvotes: 0

Views: 51

Answers (2)

akrun
akrun

Reputation: 887098

We create the 'd' with lead of 'b' and filter out the NA rows of 'c' and remove the c column with select

library(dplyr)
dat %>%
    mutate(d = lead(b)) %>%
    filter(!is.na(c)) %>%
    select(-c)
# A tibble: 2 x 3
# a         b     d
#  <chr> <dbl> <dbl>
#1 x         1     2
#2 z         3     4

Or more compactly

dat %>%
    mutate(d = replace(lead(b), is.na(c), NA), c = NULL) %>% 
    na.omit

Or with fill

library(tidyr)
dat %>%
   mutate(c1 = c) %>%
   fill(c1) %>% 
   group_by(c1) %>%
   mutate(d = lead(b)) %>% 
   ungroup %>% 
   filter(!is.na(c)) %>% 
   select(-c, -c1)

Or in data.table

library(data.table)
setDT(dat)[, d := shift(b, type = 'lead')][!is.na(c)][, c := NULL][]
#   a b d
#1: x 1 2
#2: z 3 4

NOTE: Both the solutions are simple and doesn't require any joins. Besides, it gives the expected output in the OP's post


Or using match from base R

cbind(na.omit(dat), d = with(dat, b[match(c, a, nomatch = 0)]))[, -3]
#  a b d
#1 x 1 2
#2 z 3 4

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

We can join dat with itself using c and a columns.

library(dplyr)

dat %>%
  inner_join(dat %>% select(-c) %>% rename(d = 'b'), 
             by = c('c' = 'a'))


# A tibble: 2 x 4
#  a         b c         d
#  <chr> <dbl> <chr> <dbl>
#1 x         1 y         2
#2 z         3 q         4

In base R, we can do this with merge :

merge(dat, dat[-3], by.x = 'c', by.y = 'a')

Upvotes: 1

Related Questions