Reputation: 518
I'm trying to figure out how to extract the names in the character string:
str <- "Bob 1/4 F4 Mary Lou 5/1 Thomas Tank 66/19"
to a vector: "Bob", "Mary Lou", "Thomas Tank"
I have the following which returns "Bob". Can anyone tell me how to match the following globally?
cVec <- ""
findMatch <- regexpr("[^0-9]+", str)
cVec <- append(cVec, regmatches(str,findMatch))
cVec
Ideally I'd like a list with both the name and fraction elements eg "Bob", "1/4" "Mary Lou", "5/1" "Thomas Tank", "66/19" But I suspect that F4 is going to be difficult (it's not needed). I'd settle for the names!
Cheers.
Upvotes: 1
Views: 114
Reputation: 81683
You can extract the names and fractions with the following command:
regmatches(str, gregexpr("[[:alpha:]]+( [[:alpha:]]+)?\\b|\\d+/\\d+", str))
# [[1]]
# [1] "Bob" "1/4" "Mary Lou" "5/1" "Thomas Tank"
# [6] "66/19"
Upvotes: 4
Reputation: 167871
I'm not familiar with R's regex syntax, but the following Java regex matches the whole expression (\s
means whitespace; \d
means a digit, [0-9]
; ()
is a group; R seems to agree):
"([A-Za-z]+\\s)+(\\d+/\\d+(\\s[A-Z][\\d+])?)"
In Java there's a find
method that lets you walk through pattern matches. In R, I think it's gregexpr
, except this gives you a list of indices, not the strings themselves.
Upvotes: 0
Reputation: 11132
I don't know R, so I can't provide you with implementation. However, I think a solution could be made with this regex:
(?<=^| )[a-zA-Z]+(?: [a-zA-Z]+)?(?= |$)|[0-9]+/[0-9]+
It will match Bob
, 1/4
, Mary Lou
, 5/1
, Thomas Tank
, and 66/19
, but not F4
.
Online explanation and demonstration here: http://regex101.com/r/vB8rU5
Upvotes: 2
Reputation: 89547
you can do it like this:
str <- "Bob 1/4 F4 Mary Lou 5/1 Thomas Tank 66/19"
m<-gregexpr("(?i)\\b[a-z]+(?: [a-z]+)*\\b", str)
regmatches(str, m)
Upvotes: 0
Reputation: 20045
At the end of the day this is way to fuzzy to give a solid/general solution. But this would do the trick and you would just have to trim the names:
> strsplit(str, "[0-9][ 0-9F/]+[0-9]")[[1]]
[1] "Bob " " Mary Lou " " Thomas Tank "
The regular expression defines what the split looks like.
Upvotes: 0