Dejie
Dejie

Reputation: 95

Change text to lowercase in R keeping acronyms in uppercase in text mining

How can I change a full text to lowercase but retain the acronyms in uppercase using R? I need it for text mining and using udpi package. I could ofcourse use uppercase, but anyway to retain the uppercase acronyms while using lowercase?

tolower('NASA IS A US COMPANY').

tolower('NASA IS A US COMPANY')
tolower('NASA IS A US COMPANY')

Expected: NASA is a US company

Actual: nasa is a us company

Upvotes: 3

Views: 723

Answers (3)

Marc
Marc

Reputation: 21

How about this?

acronyms <- c('NASA','US')
test <- 'NASA IS A US COMPANY'

a <- tolower(test)
b <- as.list(strsplit(a, " ")[[1]])

for (i in 1:length(b)) {
  if (toupper(b[i]) %in% acronyms) {
    b[i] <- toupper(b[i])
  }
}

c <- paste(b, collapse=" ")

Upvotes: 0

Steve Lee
Steve Lee

Reputation: 776

I edited

Capitalize the first letter of both words in a two word string

just a little.

simpleCap <- function(x,abr) {
  s <- strsplit(x, " ")[[1]]
  loc = which(!s %in% abr)
  loc_abr = which(s %in% abr)
  tmp_s = s[!s %in% abr]

  paste(toupper(substring(tmp_s, 1,1)), tolower(substring(tmp_s, 2)),
        sep="", collapse=" ")

  result = character(length(s))
  result[loc] = strsplit(paste(toupper(substring(tmp_s, 1,1)), tolower(substring(tmp_s, 2)),
                               sep="", collapse=" ")," ")[[1]]
  result[loc_abr] = abr
  result = paste(result,collapse = " ")
  return(result)
}

You have to manage some abberiviate like

abr <- c("NASA", "US")

After that you can get the result below

simpleCap(abr= abr, 'NASA IS A US COMPANY')
>[1] "NASA Is A US Company"

Upvotes: 2

NelsonGon
NelsonGon

Reputation: 13319

We can do: test is the input:

paste(lapply(strsplit(test," "),function(x) ifelse(x %in% toupper(tm::stopwords()),
                                              tolower(x),x))[[1]],collapse=" ")
[1] "NASA is a US COMPANY"

Upvotes: 3

Related Questions