Reputation: 21
For example:
x<-"Saint A/74/PV.46 12/12/2019 4/66 19-40538 Lucia"
Should give me "Saint Lucia".
I tried
trimws(gsub("\\w*[0-9]+\\w*\\s*", "", x))
which gave me
Saint A//PV.///-Lucia
Any help would be very much appreciated.
Upvotes: 2
Views: 260
Reputation: 6529
You could use gsub
to replace the characters starting from the first space(" "
) to the last space with a single space.
x <- "Saint A/74/PV.46 12/12/2019 4/66 19-40538 Lucia"
gsub(" .+ ", " ", x)
[1] "Saint Lucia"
Upvotes: 0
Reputation: 887971
We could use gsub
to match letters, digits, from a word boundary (\\b
) to the next, and replace with blank (""
)
gsub("\\s{2,}", " ", gsub("\\b[A-Z/0-9.-]+\\b", "", x))
#[1] "Saint Lucia"
Or using str_extract
library(stringr)
str_c(str_extract_all(x, "(?<= |^)[[:alpha:]]+(?= |$)")[[1]], collapse = " ")
#[1] "Saint Lucia"
Upvotes: 2
Reputation: 627536
You can use a replacing approach:
x<-"Saint A/74/PV.46 12/12/2019 4/66 19-40538 Lucia"
gsub("\\s*(?<!\\S)(?!\\p{L}+(?!\\S))\\S+", "", x, perl=TRUE)
## => [1] "Saint Lucia"
library(stringr)
str_replace_all(x, "\\s*(?<!\\S)(?!\\p{L}+(?!\\S))\\S+", "")
## => [1] "Saint Lucia"
See the R demo. See the regex demo. Details:
\s*
- zero or more whitespaces(?<!\S)
- start of string or a position immediately preceded with a whitespace(?!\p{L}+(?!\S))
- the next non-whitespace chunk cannot be a letter only word\S+
- one or more non-whitespace chars.Or, you may match all letter only words in between whitespace boundaries and join the matches with a space:
paste(unlist(regmatches(x, gregexpr("(?<!\\S)\\p{L}+(?!\\S)", x, perl=TRUE))), collapse=" ")
See the R demo online. Also, see the regex demo, it matches
(?<!\S)
- a position at the start of string or right after a whitespace\p{L}+
- one or more Unicode letters(?!\S)
- immediately on the right, there must be a whitespace or end of string.Upvotes: 1