How to find the longest string in a text using regex in R

Question

Given a string x, i can count the number of words (length) in this string using gregexpr("[A-Za-z]\w+", x) .

> x<-"





Masters Publics












Masters Universitaires et Prives au Maroc







\n



Masters Par Ville












"
> sapply(gregexpr("[A-Za-z]\w+", x), function(x) sum(x > 0))
[1] 11

However, how can i retrieve the number of words in the longest attached string (with space and not ), using regex under R environnent

in this example it would be "Masters Universitaires et Prives au Maroc" which length is 6 .

Thanks in Advance .

Wiktor Stribiżew · Accepted Answer

I would solve it with

x <- "





Masters Publics












Masters Universitaires et Prives au Maroc







\n



Masters Par Ville












"
max(nchar(gsub("[^ ]+", "", unlist(strsplit(trimws(x), "
+"))))) + 1

Split a trimmed string into lines, unlist the result, remove all characters other than a space, get the longest item and add one. The [^ ]+ is a regex that matches one or more (due to the + quantifier) characters other than (as [^...] is a negated character class) a space.

See IDEONE demo.

How to find the longest string in a text using regex in R

Answers (2)

Related Questions