giac
giac

Reputation: 4299

R - grep remove UPPER case rows

I would like to remove all the rows containing UPPERCASE words.

My data looks like this :

                                      dt
1        TRAVEL AND UNSPECIFIED TIME USE
2                      TRAVEL BY PURPOSE
3 Travel related to unspecified time use
4    Travel related to personal business

I don't understand why this isn't working

dt[-c(grep('[A-Z]', dt$dt)) , ] 

Because, strangely it works when I generate random data on mtcars like this :

l = sample( c(letters[1:16], LETTERS[1:16]) ) 
mtcars$code = l
mtcars[-c( grep('[A-Z]', mtcars$code) ) , ] 

Can someone help me ?

dt = c("TRAVEL AND UNSPECIFIED TIME USE", 
"TRAVEL BY PURPOSE", 
"Travel related to unspecified time use",
"Travel related to personal business") 
dt = as.data.frame(dt)
dt$dt = as.character(dt$dt)

Upvotes: 3

Views: 2413

Answers (1)

akrun
akrun

Reputation: 887193

In addition to capital letteres, there is also space, so we can match one or more capital letters including space ([A-Z ]+) from start (^) of string to end ($) in the grepl, and negate (!) to return elements that includes lower-case or lower-case with upper case (mixed) or all other possibilities.

dt[!grepl("^[A-Z ]+$",dt$dt),, drop = FALSE]
#                                   dt
#3 Travel related to unspecified time use
#4    Travel related to personal business

In the OP's other example 'l', there is only a single character per string. So, using [A-Z] works, however, it is better not to use -. For example, suppose we have a vector with all the elements in lower-case

v1 <- c('a', 'aB', 'b')
v1[-grep("^[A-Z]+$", v1)]
#character(0)

as

grep("^[A-Z]+$", v1)
#integer(0)

However, negating (!) will get the expected output

 v1[!grepl("^[A-Z]+$", v1)]
 #[1] "a"  "aB" "b" 

Upvotes: 7

Related Questions