Reputation: 1280
I have two datasets that look like this: The first is :
id | gear |
---|---|
A1 | A |
A2 | B |
A3 | C |
A4 | D |
A5 | E |
A6 | F |
A7 | G |
A8 | H |
A9 | I |
A10 | G |
And the second :
id | gear2 |
---|---|
A1 | A |
A4 | E |
A2 | A |
A5 | E |
A13 | B |
A3 | C |
A9 | I |
A8 | B |
A7 | G |
A20 | G |
A21 | B |
A23 | D |
A33 | E |
There two unbalanced data frames.The first data frame is the recorded data set.The second one contains what is known about the gear coming from an id.I want to check the first data frame if what is recorded actually is known or unknown.Specifically i want to check given the id code to check is the gear is the same in both data frames.But individually on each id. Ideally the result must be :
id | gear | CHECK |
---|---|---|
A1 | A | TRUE |
A2 | B | FALSE |
A3 | C | TRUE |
A4 | D | FALSE |
A5 | E | TRUE |
A6 | F | N/A |
A7 | G | TRUE |
A8 | H | FALSE |
A9 | I | TRUE |
A10 | G | N/A |
id1 = c("A1","A2","A3","A4","A5","A6","A7","A8","A9","A10")
Gear1 = c("A","B","C","D","E","F","G","H","I","G")
dat1 = tibble(id1,Gear1);dat1
id2 = c("A1","A4","A2","A5","A13","A3","A9","A8","A7","A20","A21","A23","A33")
Gear2 = c("A","E","A","E","B","C","I","B","G","G","B","D","E")
dat2 = tibble(id2,Gear2);dat2
How can i do it in R using the dplyr package ? Any help ?
Upvotes: 0
Views: 637
Reputation: 10365
You can use a left_join
and then compare the two columns:
library(dplyr)
dat1 %>%
left_join(dat2, by = c("id1" = "id2")) %>%
mutate(CHECK = Gear1 == Gear2) %>%
select(id = id1, gear = Gear1, CHECK)
# A tibble: 10 × 3
id gear CHECK
<chr> <chr> <lgl>
1 A1 A TRUE
2 A2 B FALSE
3 A3 C TRUE
4 A4 D FALSE
5 A5 E TRUE
6 A6 F NA
7 A7 G TRUE
8 A8 H FALSE
9 A9 I TRUE
10 A10 G NA
Have a look at the dplyr
documentation how to use joins.
Thanks to Ritchie Sacramento for the trick to do the renaming directly in the select
function.
Upvotes: 3
Reputation: 1683
library(tidyverse)
dat1 = rename(dat1, id = 'id1')
dat2 = rename(dat2, id = 'id2')
check_data = dat1 %>%
full_join(dat2, by='id') %>%
mutate(check = ifelse(Gear1==Gear2, TRUE, FALSE)) %>%
filter(! is.na(Gear1))
Output:
check_data
# A tibble: 10 x 4
id Gear1 Gear2 check
<chr> <chr> <chr> <lgl>
1 A1 A A TRUE
2 A2 B A FALSE
3 A3 C C TRUE
4 A4 D E FALSE
5 A5 E E TRUE
6 A6 F NA NA
7 A7 G G TRUE
8 A8 H B FALSE
9 A9 I I TRUE
10 A10 G NA NA
Upvotes: 0
Reputation: 159
Try this:
dat1 |>
mutate(check = ifelse(!id1 %in% dat2$id2,
NA,
ifelse(paste(id1, Gear1) %in% paste(dat2$id2, dat2$Gear2),
TRUE,
FALSE)))
Upvotes: 0