Bg1850
Bg1850

Reputation: 3092

How to know if the value if a column is part of another column's value in R data.table

I have a data.table where I have few customers,some day value and pay_day value . pay_day is a vector of length 5 for each customer and it consists of day values

I want to check each day value with the pay_day vector whether the day is part of the pay_day

Here is a dummy data for this (pardon for the messy way to create the data ) could not think of a better way atm

customers <- c("179288" ,"146506" ,"202287","16207","152979","14421","41395","199103","183467","151902")
mdays <- 1:31
set.seed(1)
data <- sort(rep(customers,100))
days <- sample(mdays,1000,replace=T)

xyz <- cbind(data,days)
x <- vector(length=1000L)
j <- 1
for( i in 1:10){

  set.seed(i) ## I wanted diff dates to be picked 
  m <- sample(mdays,5)


  while(j <=100*i){
  x[j] <- paste(m,collapse = ",")
  j <- j+1
  }
}

xyz <- cbind(xyz,x)
require(data.table)
my_data <-  setDT(as.data.frame(xyz))

setnames(my_data, c("cust","days","pay_days"))
my_data[,pay:=runif(1000,min = 0,max=10000)]

Now I want for each cust the vector of pays which happens in pay_days.

i have tried various ways but cant seem to figure it out , my initial thought is to create a flag based if days is a subset of pay_days and then take the pays according to the flag my_data[,ifelse(grepl(days,pay_days),1,0),cust]

this does not work as I expect it to . I dont want to use a native loop as the

actual data is huge .

Upvotes: 0

Views: 68

Answers (2)

user6278894
user6278894

Reputation:

Another one-liner approach using the data table:

my_data[,result:=sum(unlist(lapply(strsplit(as.character(pay_days),","),match,days)),na.rm=T)>0,by=1:nrow(my_data)]

Upvotes: 0

bendae
bendae

Reputation: 789

Using tidyr to split the pay_days column into and then checking if days is in pay_days:

library(tidyr)
library(dplyr)

# creating long-form data
tidier <- my_data %>% 
  mutate(pay_days = strsplit(as.character(pay_days), ",")) %>%
  unnest(pay_days)

# casting as numeric to make factor & character columns comparable
tidier[, days := as.numeric(days)]
tidier[, pay_days := as.numeric(pay_days)]

tidier[days == pay_days, pay, by=cust]

Not sure how this performs for large data, as you multiply your table length by the number of days in pay_days...

Side note: I can't comment yet, but to replicate your data one needs to add library(data.table) and initialize x x<-vector() which is otherwise not found, as Dee also points out.

Upvotes: 1

Related Questions