Reputation: 197
I see this is a common issue but I can't understand what to do from reading other posts or trying to understand functional programming which is new to me. Functions are closures in R, encapsulating the environment they were created in? The code I have is:
# Remove numbers from text
minus_TextNum <- function(df, new.df){
new.df <- mutate(df, text = gsub(x = text, pattern = "[0-9]+|\\(.*\\)", replacement = "")) %>% # and/or whatever's in brackets
unnest_tokens(input = text, output = word) %>%
filter(!word %in% c(stop_words$word, "patient")) %>%
group_by(id) %>%
summarise(text = paste(word, collapse = " "))
return(new.df)
}
minus_TextNum(TidySymptoms)
Error is as follows:
Error: Problem with
mutate()
columntext
. ℹtext = gsub(x = text, pattern = "[0-9]+|\\(.*\\)", replacement = "")
. x cannot coerce type 'closure' to vector of type 'character'
I don't understand what type closure is, and this is a simple function that works on a simple dataset I created to test. Problem arises when I use the real-world dataset.
Any feedback appreciated. Reproducible sample below:
# Remove numbers and/or anything in brackets
# Test Data
mydata <- data.frame(id = 1:8,
text = c("112773 Nissan Micra, Car, (10 pcs)",
"112774 Nissan Micra, Car, (10 pcs)",
"112775 Nissan Micra, Car, (10 pcs)",
"112776 Volkswagon Beetle, Car, (3 pcs)",
"112777 Toyota Corolla, Car, (12 pcs)",
"112778 Nissan Micra, Car, (10 pcs)",
"112779 Toyota Prius, Car, (9 pcs)",
"112780 Toyota Corolla, Car, (12 pcs)"),
stringsAsFactors = F)
library(dplyr)
library(tidytext)
# remove numbers from text data
data(stop_words)
minus_TextNum <- function(df, new.df){
new.df <- mutate(df, text = gsub(x = text, pattern = "[0-9]+|\\(.*\\)", replacement = "")) %>% # and/or whatevers in brackets
unnest_tokens(input = text, output = word) %>%
filter(!word %in% c(stop_words$word, "car")) %>%
group_by(id) %>%
summarise(text = paste(word, collapse = " "))
return(new.df)
}
minus_TextNum(mydata)
dput(head(TidySymptoms, n = 10)) structure(list(word = c("epiglottis", "swelled", "hinder", "swallowing", "pictures", "benadryl", "tylenol", "approximately", "30", "min" )), row.names = c(NA, 10L), class = "data.frame")
Upvotes: 0
Views: 824
Reputation: 388982
TidySymptoms
data has no id
column in it. Assuming it's a mistake and you have that already in your data you can do the following changes in the function.
df.new
to the function.TidySymptoms
is called as word
but you are using text
in the function.Try this code.
minus_TextNum <- function(df){
df.new <- mutate(df, text = gsub(x = word, pattern = "[0-9]+|\\(.*\\)", replacement = "")) %>%
unnest_tokens(input = text, output = word) %>%
filter(!word %in% c(stop_words$word, "patient")) %>%
group_by(id) %>%
summarise(text = paste(word, collapse = " "))
return(new.df)
}
minus_TextNum(TidySymptoms)
Upvotes: 1