Reputation: 95
How can I change a full text to lowercase but retain the acronyms in uppercase using R? I need it for text mining and using udpi package. I could ofcourse use uppercase, but anyway to retain the uppercase acronyms while using lowercase?
tolower('NASA IS A US COMPANY').
tolower('NASA IS A US COMPANY')
tolower('NASA IS A US COMPANY')
Expected: NASA is a US company
Actual: nasa is a us company
Upvotes: 3
Views: 723
Reputation: 21
How about this?
acronyms <- c('NASA','US')
test <- 'NASA IS A US COMPANY'
a <- tolower(test)
b <- as.list(strsplit(a, " ")[[1]])
for (i in 1:length(b)) {
if (toupper(b[i]) %in% acronyms) {
b[i] <- toupper(b[i])
}
}
c <- paste(b, collapse=" ")
Upvotes: 0
Reputation: 776
I edited
Capitalize the first letter of both words in a two word string
just a little.
simpleCap <- function(x,abr) {
s <- strsplit(x, " ")[[1]]
loc = which(!s %in% abr)
loc_abr = which(s %in% abr)
tmp_s = s[!s %in% abr]
paste(toupper(substring(tmp_s, 1,1)), tolower(substring(tmp_s, 2)),
sep="", collapse=" ")
result = character(length(s))
result[loc] = strsplit(paste(toupper(substring(tmp_s, 1,1)), tolower(substring(tmp_s, 2)),
sep="", collapse=" ")," ")[[1]]
result[loc_abr] = abr
result = paste(result,collapse = " ")
return(result)
}
You have to manage some abberiviate like
abr <- c("NASA", "US")
After that you can get the result below
simpleCap(abr= abr, 'NASA IS A US COMPANY')
>[1] "NASA Is A US Company"
Upvotes: 2
Reputation: 13319
We can do: test is the input:
paste(lapply(strsplit(test," "),function(x) ifelse(x %in% toupper(tm::stopwords()),
tolower(x),x))[[1]],collapse=" ")
[1] "NASA is a US COMPANY"
Upvotes: 3