Reputation: 2570
I have the following string:
strings <- c("David, FC; Haramey, S; Devan, IA",
"Colin, Matthew J.; Haramey, S",
"Colin, Matthew")
If I want the last initials/givenname for all strings i can use the following:
sub(".*, ", "", strings)
[1] "IA" "S" "Matthew"
This removes everything before the last ", "
However, I am stuck on how to get the the first initials/givenname. I know have to remove everything before the first ", "
but then I have to remove everything after any spaces, semicolons, if any.
To be clear the output I want is:
c("FC", "Matthew", "Matthew")
Any pointers would be great.
fiddling i can get the first surnames gsub( " .*$", "", strings )
Upvotes: 2
Views: 1026
Reputation: 626845
You can use
> gsub( "^[^\\s,]+,\\s+([^;.\\s]+).*", "\\1", strings, perl=T)
[1] "FC" "Matthew" "Matthew"
See the regex demo
Explanation:
^
- start of string[^\\s,]+
- 1 or more characters other than whitespace or ,
,
- a literal comma\\s+
- 1 or more whitespace([^;.\\s]+)
- Group 1 matching 1 or more characters other than ;
, .
or whitespace.*
- zero or more any character other than a newlineIf you want to use a POSIX-like expression, replace \\s
inside the character classes (inside [...]
) with [:blank:]
(or [:space:]
):
gsub( "^[^[:blank:],]+,\\s+([^;.[:blank:]]+).*", "\\1", strings)
Upvotes: 5