Reputation: 4299
I would like to remove all the rows containing UPPERCASE words.
My data looks like this :
dt
1 TRAVEL AND UNSPECIFIED TIME USE
2 TRAVEL BY PURPOSE
3 Travel related to unspecified time use
4 Travel related to personal business
I don't understand why this isn't working
dt[-c(grep('[A-Z]', dt$dt)) , ]
Because, strangely it works when I generate random data on mtcars
like this :
l = sample( c(letters[1:16], LETTERS[1:16]) )
mtcars$code = l
mtcars[-c( grep('[A-Z]', mtcars$code) ) , ]
Can someone help me ?
dt = c("TRAVEL AND UNSPECIFIED TIME USE",
"TRAVEL BY PURPOSE",
"Travel related to unspecified time use",
"Travel related to personal business")
dt = as.data.frame(dt)
dt$dt = as.character(dt$dt)
Upvotes: 3
Views: 2413
Reputation: 887193
In addition to capital letteres, there is also space, so we can match one or more capital letters including space ([A-Z ]+
) from start (^
) of string to end ($
) in the grepl
, and negate (!
) to return elements that includes lower-case or lower-case with upper case (mixed) or all other possibilities.
dt[!grepl("^[A-Z ]+$",dt$dt),, drop = FALSE]
# dt
#3 Travel related to unspecified time use
#4 Travel related to personal business
In the OP's other example 'l', there is only a single character per string. So, using [A-Z]
works, however, it is better not to use -
. For example, suppose we have a vector with all the elements in lower-case
v1 <- c('a', 'aB', 'b')
v1[-grep("^[A-Z]+$", v1)]
#character(0)
as
grep("^[A-Z]+$", v1)
#integer(0)
However, negating (!
) will get the expected output
v1[!grepl("^[A-Z]+$", v1)]
#[1] "a" "aB" "b"
Upvotes: 7