Reputation: 111
This is the continuation of the following thread:
Creating Binary Identifiers Based On Condition Of Word Combinations For Filter
Expected output is the same as per the said thread.
I am now writing a function that can take dynamic names as variables.
This is the code that I am aiming at, if I am to run it manually:
df <- df %>% group_by(id, date) %>% mutate(flag1 = if(eval(parse(text=conditions))) grepl(pattern, item_name2) else FALSE)
To make it take into consideration dynamic variable names, I have been doing the code this way:
groupcolumns <- c(id, date)
# where id and date will be entered into the function as character strings by the user
variable <- list(~if(eval(parse(text=conditions))) grepl(pattern, item) else FALSE)
# converting to formula to use with dynamically generated column names
# "conditons" being the following character vector, which I can automatically generate:
conditons <- "any(grepl("Alpha", Item)) & any(grepl("Bravo", Item))"
This becomes:
df <- df %>% group_by_(.dots = groupcolumns) %>% mutate_(.dots = setNames(variable, flags[1]))
# where flags[1] is a predefined vector of columns names that I have created
flags <- paste("flag", seq(1:100), sep = "")
The problem is, I am unable to do anything to the grepl function; to specify the "item" dynamically. If I do it this way, as "df$item", and do a eval(parse(text="df$item")), the intention of piping fails as I am doing a group_by_ and it results in an error (naturally). This also applies to the conditions that I set.
Does a way exists for me to tell grepl to use a dynamic variable name?
Thanks a lot (especially to akrun)!
edit 1:
tried the following, and now there is no problem of passing the name of the item into grepl.
variable <- list(~if(eval(parse(text=conditions))) grepl(pattern, as.name(item)) else FALSE)
However, the problem lies in that piping seems not to work, as the output of as.name(item) is seen as an object, which does not exist in the environment.
edit 2:
trying do() in dplyr:
variable <- list(~if(eval(parse(text=conditions))) grepl(pattern, .$deparse(as.name(item))) else FALSE)
df <- df %>% group_by_(.dots = groupcolumns) %>% do_(.dots = setNames(variable, combiflags[1]))
which throws me the error:
Error: object 'Item' not found
Upvotes: 3
Views: 2228
Reputation: 1002
If I understand your question correctly, you want to be able to dynamically input both patterns and the object to be searched by these patterns in grepl? The best solution for you will depend entirely on how you choose to store the patterns and how you choose to store the objects to be searched. I have a few ideas that should help you though.
For dynamic patterns, try inputting a list of patterns using the paste function. This will allow you to search many different patterns at once.
grepl(paste(your.pattern.list, collapse="|"), item)
Lets say you want to set up a scenario where you are storing many patterns of interest in a directory. Perhaps collected automatically from a server, or from some other output. You can create lists of patterns if they are in separate files using this:
#set working directory
setwd("/path/to/files/i/want")
#make a list of all files in this directory
inFilePaths = list.files(path=".", pattern=glob2rx("*"), full.names=TRUE)
#perform a function for each file in the list
for (inFilePath in inFilePaths)
{
#grepl function goes here
#if each file in the folder is a table/matrix/dataframe of patterns try this
inFileData = read_csv(inFilePath)
vectorData=as.vector(inFileData$ColumnOfPatterns)
grepl(paste(vectorData, collapse="|"), item)
}
For dynamically specifying the item, you can use an almost identical framework
#set working directory
setwd("/path/to/files/i/want")
#make a list of all files in this directory
inFilePaths = list.files(path=".", pattern=glob2rx("*"), full.names=TRUE)
#perform a function for each file in the list
for (inFilePath in inFilePaths)
{
#grepl function goes here
#if each file in the folder is a table/matrix/dataframe of data to be searched try this
inFileData = read_csv(inFilePath)
grepl(pattern, inFileData$ColumnToBeSearched)
}
If this is too far off from what you envisioned, please update your question with details about how the data you are using is stored.
Upvotes: 3