user8959427
user8959427

Reputation: 2067

Custom functions in `tm_map` from the package `tm`

I am trying to add custom functions into the tm_map function of the package tm however it converts the data into a different format where I cannot continue from.

For example I use

library(tm)
library(qdapRegex)

docs <- data.frame(doc_id = c("doc_1", "doc_2"),
                   text = c("This is a text. With some more text, www.yahoo.com", "This another one. with some different text www.google.com"),
                   dmeta1 = 1:2, dmeta2 = letters[1:2],
                   stringsAsFactors = FALSE)

docs = VCorpus(DataframeSource(docs))

content(docs[[1]])

docs <- tm_map(docs, content_transformer(tolower)) # This Works fine
content(docs[[1]])

nchar_rm <- function(x){
  gsub(" *\\b[[:alpha:]]{1,2}\\b *", " ", x)
} # Custom function to remove characters less than 2

docs <- tm_map(docs, nchar_rm) # implement custom function
content(docs[[1]]) # returns an error.

Error:

Error in UseMethod("content", x) : 
  no applicable method for 'content' applied to an object of class "character"

Also with the function docs <- tm_map(docs, rm_url) using the rm_url from the qdapRegex package returns an error.

Upvotes: 0

Views: 320

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388982

Use content_transformer in the same way

library(tm)

docs <- tm_map(docs, content_transformer(nchar_rm)) 
content(docs[[1]])
#[1] "This  text. With some more text, www.yahoo.com"

It will also work with rm_url

library(qdapRegex)

docs <- tm_map(docs, content_transformer(rm_url))
content(docs[[1]])
#[1] "This is a text. With some more text,"

However, you might be aware you can do this without using tm function at all using lapply/sapply/map etc.

lapply(docs$text, rm_url)
lapply(docs$text, nchar_rm)

Upvotes: 2

Related Questions