danielhong
danielhong

Reputation: 77

R: Using regmatches to extract certain characters

I'm using regmatches to extract only capital letters from code, but "" replaces all lower case letters and numbers. Is there a way to just extract the capital letters and not have ""?

code <- c("clcopCow1zmstc0d87wnkig7OvdicpNuggvhryn92Gjuwczi8hqrfpRxs5Aj5dwpn0TanwoUwisdij7Lj8kpf03AT5Idr3coc0bt7yczjatOaootj55t3Nj3ne6c4Sfek.r1w1YwwojigOd6vrfUrbz2.2bkAnbhzgv4R9i05zEcrop.wAgnb.SqoU65fPa1otfb7wEm24k6t3sR9zqe5fy89n6Nd5t9kc4fE905gmc4Rgxo5nhDk!gr")

regmatches(code, gregexpr('[[:punct:]]*[[:upper:][:punct:]]*', code))

Upvotes: 1

Views: 282

Answers (2)

Ben Bolker
Ben Bolker

Reputation: 226087

[^A-Z] is good, but [^[:upper:]] is a little better, as it won't get screwed up in peculiar locales.

gsub("[^[:upper:]]", "", code)

For slightly better readability (but perhaps overkill for this example) you might want stringr::str_extract, but I'm not quite sure how to do this cleanly:

library(stringr)
str_c(str_extract_all(code,"[[:Lu:]]+")[[1]],collapse="")

Upvotes: 2

Julius Vainora
Julius Vainora

Reputation: 48201

gsub("[^A-Z]", "", code)
# [1] "CONGRATULATIONSYOUAREASUPERNERD"

Upvotes: 2

Related Questions