Sat
Sat

Reputation: 23

Function for lookup in r

Having completed some basic course in R, have been trying to do my first task in R.

I have a dataframe (say DATA) about a million records and another dataframe (say LOOKUP) of about some 100 records.

I need to use the logic stored in LOOKUP against each record in the DATA and add a new column (say FOUND) with values YES/NO.

Please see below both the dataframes with some sample data:

> dataf <- data.frame(stringsAsFactors = FALSE, year=c(1980,1982,1985,1981,1970),name=c("abc","def","abc","klm","nop"),id=c("123bb23","234ab23","345bc23","123bc15","124bc45"))
> lookup <- data.frame(stringsAsFactors = FALSE, year=c(1980,1981,1982),name=c("abc","klm","nop"),digit=c(5,5,4),letter=c("b","c","b"))
> dataf
  year name      id
1 1980  abc 123bb23
2 1982  def 234ab23
3 1985  abc 345bc23
4 1981  klm 123bc15
5 1970  nop 124bc45
> lookup
  year name digit letter
1 1980  abc     5      b
2 1981  klm     5      c
3 1982  nop     4      b

I need my output to be something like below:

  year name      id found
1 1980  abc 123bb23   YES
2 1982  def 234ab23    NO
3 1985  abc 345bc23    NO
4 1981  klm 123bc15    YES
5 1970  nop 124bc45    NO

my function:

#hybrid FUNCTION
hybridfun <- function(df, lukup){
  for (j in 1:nrow(df)){
    df$found = "NO"
    for (i in 1:nrow(lukup)){
    if (df[[1]][[j]] == lukup[[1]][[i]])
    if (df[[2]][[j]] == lukup[[2]][[i]])
    if (substring(df[[3]][[j]], lukup[[3]][[i]], lukup[[3]][[i]]) == lukup[[4]][[i]]){
        df$found = "YES"
        break
    }
    }
  }
}

i'm calling the function as below:

hybridfun(dataf, lookup)

It looks like its doing something, but the output isn't showing as i expected

Please can someone help. If you think you need any further information, please let me know and I'll edit my post.

Upvotes: 1

Views: 160

Answers (2)

Onyambu
Onyambu

Reputation: 79288

x=do.call(paste,cbind(dataf[1:2],substring(dataf$id,lookup$digit,lookup$digit)))
y=do.call(paste,lookup[-3])

dataf$found=ifelse(x%in%y,"YES","NO")
dataf
  year name      id found
1 1980  abc 123bb23   YES
2 1982  def 234ab23    NO
3 1985  abc 345bc23    NO
4 1981  klm 123bc15   YES
5 1970  nop 124bc45    NO

Upvotes: 1

www
www

Reputation: 39154

A solution using and .

library(dplyr)
library(tidyr)

dataf2 <- dataf %>%
  left_join(lookup, by = c("year", "name")) %>%
  mutate(found = case_when(
    str_sub(id, start = digit, end = digit) == letter   ~ "YES",
    TRUE                                                ~ "NO"
  )) %>%
  select(-digit, -letter)
dataf2
#   year name      id found
# 1 1980  abc 123bb23   YES
# 2 1982  def 234ab23    NO
# 3 1985  abc 345bc23    NO
# 4 1981  klm 123bc15   YES
# 5 1970  nop 124bc45    NO

We can also turn this into a function.

hybridfun <- function(dataf, lookup){
  dataf2 <- dataf %>%
    dplyr::left_join(lookup, by = c("year", "name")) %>%
    dplyr::mutate(found = dplyr::case_when(
      stringr::str_sub(id, start = digit, end = digit) == letter   ~ "YES",
      TRUE                                                         ~ "NO"
    )) %>%
    dplyr::select(-digit, -letter)
  return(dataf2)
}
hybridfun(dataf, lookup)
#   year name      id found
# 1 1980  abc 123bb23   YES
# 2 1982  def 234ab23    NO
# 3 1985  abc 345bc23    NO
# 4 1981  klm 123bc15   YES
# 5 1970  nop 124bc45    NO

Upvotes: 1

Related Questions