Reputation: 380
Given a string x, i can count the number of words (length) in this string using gregexpr("[A-Za-z]\w+", x) .
> x<-"\n\n\n\n\n\nMasters Publics\n\n\n\n\n\n\n\n\n\n\n\n\nMasters Universitaires et Prives au Maroc\n\n\n\n\n\n\n\n\\n\n\n\n\nMasters Par Ville\n\n\n\n\n\n\n\n\n\n\n\n\n"
> sapply(gregexpr("[A-Za-z]\\w+", x), function(x) sum(x > 0))
[1] 11
However, how can i retrieve the number of words in the longest attached string (with space and not \n), using regex under R environnent
in this example it would be "Masters Universitaires et Prives au Maroc" which length is 6 .
Thanks in Advance .
Upvotes: 3
Views: 1796
Reputation: 627101
I would solve it with
x <- "\n\n\n\n\n\nMasters Publics\n\n\n\n\n\n\n\n\n\n\n\n\nMasters Universitaires et Prives au Maroc\n\n\n\n\n\n\n\n\\n\n\n\n\nMasters Par Ville\n\n\n\n\n\n\n\n\n\n\n\n\n"
max(nchar(gsub("[^ ]+", "", unlist(strsplit(trimws(x), "\n+"))))) + 1
Split a trimmed string into lines, unlist the result, remove all characters other than a space, get the longest item and add one. The [^ ]+
is a regex that matches one or more (due to the +
quantifier) characters other than (as [^...]
is a negated character class) a space.
See IDEONE demo.
Upvotes: 2
Reputation: 1846
Load the package
library(stringr)
Create a new dataset, extracting and splitting the phrases
data <- unlist(str_split(x, pattern="\n", n = Inf))
index <- lapply(data, nchar)
index <- index !=0
# extract the maximum length of the phrase
max(sapply(gregexpr("\\W+", data[index]), length) + 1)
[1] 6
# just checking
data[index]
[1] "Masters Publics"
[2] "Masters Universitaires et Prives au Maroc"
[3] "\\n"
[4] "Masters Par Ville"
Upvotes: 1