Reputation: 27
I have two data frames stu1 and stu2. Both have matching ID columns but different variables in other columns.
For example, stu1
:
ID, Grade, Group, Age
ad1, A, Green, 14
bc1, B, Green, 13
cd1, B, Blue, 14
fs3, C, Red, 13
stu2
:
ID, Prog, Loc, Year
bc1, LSC1, Ext, 2013
cd1, LSC1, Ext, 2013
cd1, BSC1, Int, 2013
ad1, BSC2, Int, 2012
rs2, KHL4, Ext, 2014
What I'm trying to do is check whether the student ID in stu1
exists in stu2
then check whether the text in another column for the corresponding row matches my string, e.g. Prog =='BSC*'
then create a new column in stu1
which states "Yes" or "No".
So, the result for stu1
should be:
ID, Grade, Group, Age, BSCProg
ad1, A, Green, 14, Yes
bc1, B, Green, 13, No
cd1, B, Blue, 14, Yes
fs3, C, Red, 13, No
I've tried a number of different ways unsuccessfully, e.g:
stu1$BSCProg <- ifelse(stu2[grepl("BSC", stu2$Prog) & match(paste0(stu1$ID),
paste0(stu1$ID)),], "Yes", "No")
stu1$BSCProg <- ifelse(is.na(match(paste0(stu1$ID),
paste0(stu2$ID) & stu2[grepl("BSC", stu2$Prog),])),"No","Yes")
stu1$BSCProg <- ifelse(stu1$ID %in% stu2$ID & grepl('BSC', stu2$Prog), "Yes", "No")
Upvotes: 0
Views: 1501
Reputation: 39154
A dplyr
and tidyr
solution. stu3
is the final output.
library(dplyr)
library(tidyr)
stu1 <- data_frame(ID = c("ad1", "bc1", "cd1", "fs3"),
Grade = c("A", "B", "B", "C"),
Group = c("Green", "Green", "Blue", "Red"),
Age = c(14, 13, 14, 13))
stu2 <- data_frame(ID = c("bc1", "cd1", "cd1", "ad1", "rs2"),
Prog = c("LSC1", "LSC1", "BSC1", "BSC2", "KHL4"),
Loc = c("Ext", "Ext", "Int", "Int", "Ext"),
Year = c(2013, 2013, 2013, 2012, 2014))
stu3 <- stu1 %>%
full_join(stu2 %>% select(ID, Prog), by = "ID") %>%
mutate(BSCProg = ifelse(grepl("BSC", Prog), "Yes", "No")) %>%
drop_na(Grade) %>%
select(-Prog) %>%
group_by(ID) %>%
arrange(desc(BSCProg)) %>%
slice(1)
Upvotes: 0
Reputation: 3230
I would do this by merging the two tables so that you can do column comparisons. Using data.table
:
library(data.table)
setDT(stu1)
setDT(stu2)
dat <- merge(stu1,
stu2[Prog %like% "BSC", .(ID, BSCProg = Prog)],
by = "ID",
all.x = TRUE)
dat[, BSCProg := ifelse(is.na(BSCProg), "No", "Yes")]
Result:
# ID Grade Group Age BSCProg
# 1: ad1 A Green 14 Yes
# 2: bc1 B Green 13 No
# 3: cd1 B Blue 14 Yes
# 4: fs3 C Red 13 No
Unpack that a bit, the first step is to merge the ID
and Prog
columns from stu2
into stu1
. The Prog %like% "BSC"
section will merge only those rows where the Prog
column has "BSC" as part of the value. BSCProg = Prog
is to rename the column to what you want in the end.
When this is done, the values of the column will be NA
or values like BSC1
, BSC2
. The final statement BSCProg := ifelse(is.na(BSCProg), "No", "Yes")
will change any NA
to "No" and turn anything else to "Yes".
Upvotes: 4
Reputation: 398
You can merge
by ID
first, then create your new columns. Here is a data.table
solution:
library(data.table)
setDT(stu1, key="ID")
setDT(stu2, key="ID")
stu1 = merge(stu1, stu2, all.x=TRUE)
stu1[, BSCProg:=ifelse(grepl("^BSC", Prog), "Yes", "No")]
Upvotes: 1