Reputation: 23
Having completed some basic course in R, have been trying to do my first task in R.
I have a dataframe (say DATA) about a million records and another dataframe (say LOOKUP) of about some 100 records.
I need to use the logic stored in LOOKUP against each record in the DATA and add a new column (say FOUND) with values YES/NO.
Please see below both the dataframes with some sample data:
> dataf <- data.frame(stringsAsFactors = FALSE, year=c(1980,1982,1985,1981,1970),name=c("abc","def","abc","klm","nop"),id=c("123bb23","234ab23","345bc23","123bc15","124bc45"))
> lookup <- data.frame(stringsAsFactors = FALSE, year=c(1980,1981,1982),name=c("abc","klm","nop"),digit=c(5,5,4),letter=c("b","c","b"))
> dataf
year name id
1 1980 abc 123bb23
2 1982 def 234ab23
3 1985 abc 345bc23
4 1981 klm 123bc15
5 1970 nop 124bc45
> lookup
year name digit letter
1 1980 abc 5 b
2 1981 klm 5 c
3 1982 nop 4 b
I need my output to be something like below:
year name id found
1 1980 abc 123bb23 YES
2 1982 def 234ab23 NO
3 1985 abc 345bc23 NO
4 1981 klm 123bc15 YES
5 1970 nop 124bc45 NO
my function:
#hybrid FUNCTION
hybridfun <- function(df, lukup){
for (j in 1:nrow(df)){
df$found = "NO"
for (i in 1:nrow(lukup)){
if (df[[1]][[j]] == lukup[[1]][[i]])
if (df[[2]][[j]] == lukup[[2]][[i]])
if (substring(df[[3]][[j]], lukup[[3]][[i]], lukup[[3]][[i]]) == lukup[[4]][[i]]){
df$found = "YES"
break
}
}
}
}
i'm calling the function as below:
hybridfun(dataf, lookup)
It looks like its doing something, but the output isn't showing as i expected
Please can someone help. If you think you need any further information, please let me know and I'll edit my post.
Upvotes: 1
Views: 160
Reputation: 79288
x=do.call(paste,cbind(dataf[1:2],substring(dataf$id,lookup$digit,lookup$digit)))
y=do.call(paste,lookup[-3])
dataf$found=ifelse(x%in%y,"YES","NO")
dataf
year name id found
1 1980 abc 123bb23 YES
2 1982 def 234ab23 NO
3 1985 abc 345bc23 NO
4 1981 klm 123bc15 YES
5 1970 nop 124bc45 NO
Upvotes: 1
Reputation: 39154
A solution using dplyr and stringr.
library(dplyr)
library(tidyr)
dataf2 <- dataf %>%
left_join(lookup, by = c("year", "name")) %>%
mutate(found = case_when(
str_sub(id, start = digit, end = digit) == letter ~ "YES",
TRUE ~ "NO"
)) %>%
select(-digit, -letter)
dataf2
# year name id found
# 1 1980 abc 123bb23 YES
# 2 1982 def 234ab23 NO
# 3 1985 abc 345bc23 NO
# 4 1981 klm 123bc15 YES
# 5 1970 nop 124bc45 NO
We can also turn this into a function.
hybridfun <- function(dataf, lookup){
dataf2 <- dataf %>%
dplyr::left_join(lookup, by = c("year", "name")) %>%
dplyr::mutate(found = dplyr::case_when(
stringr::str_sub(id, start = digit, end = digit) == letter ~ "YES",
TRUE ~ "NO"
)) %>%
dplyr::select(-digit, -letter)
return(dataf2)
}
hybridfun(dataf, lookup)
# year name id found
# 1 1980 abc 123bb23 YES
# 2 1982 def 234ab23 NO
# 3 1985 abc 345bc23 NO
# 4 1981 klm 123bc15 YES
# 5 1970 nop 124bc45 NO
Upvotes: 1