Clemsang
Clemsang

Reputation: 5491

First two letters from words Regex in R

I am trying to get the first upper and lower letter from each word from a string.

string<-"Programmation _ Is 2 Cool"
gsub("[^A-Z]", "", string)
gsub("[^A-Za-z]", "", string)

The two results are :

"PIC"
"ProgrammationIsCool"

I would like to get :

"PrIsCo"

Thanks for help

Upvotes: 2

Views: 1913

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627100

If the first uppercase and the next lowercase letters must be extracted, use

(\\b[A-Z][a-z])|.

or

(\\b\\p{Lu}\\p{Ll})|.

The idea is to match and capture first uppercase and the following lowercase letters, and remove all the rest.

gsub("(\\b[A-Z][a-z])|.", "\\1", string, perl=TRUE)

Note that to remove newlines, you will need to pre-pend (?s) to the beginning of the pattern.

Pattern details:

  • (\\b[A-Z][a-z]) - Group 1 matching
    • \\b - a word boundary
    • [A-Z][a-z] - An uppercase ASCII letter followed with a lowercase ASCII letter (replace with \\p{Lu}\\p{Ll} to match any Unicode uppercase-lowercase letters).
  • | - or
  • . - any character but a newline

Upvotes: 4

Related Questions