Reputation: 2067
I am trying to add custom functions into the tm_map
function of the package tm
however it converts the data into a different format where I cannot continue from.
For example I use
library(tm)
library(qdapRegex)
docs <- data.frame(doc_id = c("doc_1", "doc_2"),
text = c("This is a text. With some more text, www.yahoo.com", "This another one. with some different text www.google.com"),
dmeta1 = 1:2, dmeta2 = letters[1:2],
stringsAsFactors = FALSE)
docs = VCorpus(DataframeSource(docs))
content(docs[[1]])
docs <- tm_map(docs, content_transformer(tolower)) # This Works fine
content(docs[[1]])
nchar_rm <- function(x){
gsub(" *\\b[[:alpha:]]{1,2}\\b *", " ", x)
} # Custom function to remove characters less than 2
docs <- tm_map(docs, nchar_rm) # implement custom function
content(docs[[1]]) # returns an error.
Error:
Error in UseMethod("content", x) :
no applicable method for 'content' applied to an object of class "character"
Also with the function docs <- tm_map(docs, rm_url)
using the rm_url
from the qdapRegex
package returns an error.
Upvotes: 0
Views: 320
Reputation: 388982
Use content_transformer
in the same way
library(tm)
docs <- tm_map(docs, content_transformer(nchar_rm))
content(docs[[1]])
#[1] "This text. With some more text, www.yahoo.com"
It will also work with rm_url
library(qdapRegex)
docs <- tm_map(docs, content_transformer(rm_url))
content(docs[[1]])
#[1] "This is a text. With some more text,"
However, you might be aware you can do this without using tm
function at all using lapply
/sapply
/map
etc.
lapply(docs$text, rm_url)
lapply(docs$text, nchar_rm)
Upvotes: 2