Homer Jay Simpson
Homer Jay Simpson

Reputation: 1280

How can i match the values of a column according to another of a data frame in R using dplyr?

I have two datasets that look like this: The first is :

id gear
A1 A
A2 B
A3 C
A4 D
A5 E
A6 F
A7 G
A8 H
A9 I
A10 G

And the second :

id gear2
A1 A
A4 E
A2 A
A5 E
A13 B
A3 C
A9 I
A8 B
A7 G
A20 G
A21 B
A23 D
A33 E

There two unbalanced data frames.The first data frame is the recorded data set.The second one contains what is known about the gear coming from an id.I want to check the first data frame if what is recorded actually is known or unknown.Specifically i want to check given the id code to check is the gear is the same in both data frames.But individually on each id. Ideally the result must be :

id gear CHECK
A1 A TRUE
A2 B FALSE
A3 C TRUE
A4 D FALSE
A5 E TRUE
A6 F N/A
A7 G TRUE
A8 H FALSE
A9 I TRUE
A10 G N/A
id1 = c("A1","A2","A3","A4","A5","A6","A7","A8","A9","A10")
Gear1 = c("A","B","C","D","E","F","G","H","I","G")
dat1 = tibble(id1,Gear1);dat1

id2 = c("A1","A4","A2","A5","A13","A3","A9","A8","A7","A20","A21","A23","A33")
Gear2 = c("A","E","A","E","B","C","I","B","G","G","B","D","E")
dat2 = tibble(id2,Gear2);dat2

How can i do it in R using the dplyr package ? Any help ?

Upvotes: 0

Views: 637

Answers (3)

starja
starja

Reputation: 10365

You can use a left_join and then compare the two columns:

library(dplyr)

dat1 %>% 
  left_join(dat2, by = c("id1" = "id2")) %>% 
  mutate(CHECK = Gear1 == Gear2) %>% 
  select(id = id1, gear = Gear1, CHECK)

# A tibble: 10 × 3
   id    gear  CHECK
   <chr> <chr> <lgl>
 1 A1    A     TRUE 
 2 A2    B     FALSE
 3 A3    C     TRUE 
 4 A4    D     FALSE
 5 A5    E     TRUE 
 6 A6    F     NA   
 7 A7    G     TRUE 
 8 A8    H     FALSE
 9 A9    I     TRUE 
10 A10   G     NA   

Have a look at the dplyr documentation how to use joins.

Edit

Thanks to Ritchie Sacramento for the trick to do the renaming directly in the select function.

Upvotes: 3

RobertoT
RobertoT

Reputation: 1683

library(tidyverse)

dat1 = rename(dat1, id = 'id1')
dat2 = rename(dat2, id = 'id2')

check_data = dat1 %>% 
  full_join(dat2, by='id') %>% 
  mutate(check = ifelse(Gear1==Gear2, TRUE, FALSE)) %>% 
  filter(! is.na(Gear1))

Output:

check_data
# A tibble: 10 x 4
   id    Gear1 Gear2 check
   <chr> <chr> <chr> <lgl>
 1 A1    A     A     TRUE 
 2 A2    B     A     FALSE
 3 A3    C     C     TRUE 
 4 A4    D     E     FALSE
 5 A5    E     E     TRUE 
 6 A6    F     NA    NA   
 7 A7    G     G     TRUE 
 8 A8    H     B     FALSE
 9 A9    I     I     TRUE 
10 A10   G     NA    NA   

Upvotes: 0

flopeko
flopeko

Reputation: 159

Try this:

dat1 |> 
  mutate(check = ifelse(!id1 %in% dat2$id2, 
                        NA,
                        ifelse(paste(id1, Gear1) %in% paste(dat2$id2, dat2$Gear2), 
                               TRUE, 
                               FALSE)))

Upvotes: 0

Related Questions