Reputation: 371
I have many names of people in my character vector:
MLB$Names[1:4]
[1] "Derek Jeter" "Robinson Cano" "Nick Markakis" "David Ortiz"
I would like to format them to contain the first inital, with a period, then followed by a space and their last name. I want it to look like the following
MLB$NamesFormatted[1:4]
[1] "D. Jeter" "R. Cano" "N. Markakis" "D. Ortiz"
I'm assuming the best way to attack this would be by using grep
or sub
, but I can't for the life of me figure it out. I'm still a rookie at using R, but I'm loving all of its capabilities!
Any help would be greatly appreciated! Thank you!
Upvotes: 1
Views: 136
Reputation: 7443
We can use strsplit
and paste
:
x <- c("Derek Jeter", "Robinson Cano", "Nick Markakis", "David Ortiz")
sapply(strsplit(x, " "), function(x) paste0(substr(x[1], 1, 1), ". ", x[2]))
[1] "D. Jeter" "R. Cano" "N. Markakis" "D. Ortiz"
We first split name and surname and we obtain a list; we sapply
it with an anonymous function that: i) takes the initial of the surname, ii) adds a dot and a space, iii) adds the family name.
Upvotes: 1
Reputation: 887213
We can use sub
by capturing the first character as a group (^(.)
) followed by one or more non-white spaces (\\S+
) followed by another capture group of one or more white space succeeded by one or more characters ((\\s+.*)
) to the end ($
) of the string and replace by the first backreference (\\1
) followed by a .
followed by second backreference (\\2
).
sub("^(.)\\S+(\\s+.*)$", "\\1.\\2", MLB$Names)
#[1] "D. Jeter" "R. Cano" "N. Markakis" "D. Ortiz"
Or it can be done with a compact code of matching one or more lower case letters ([a-z]+
) and replace it with .
.
sub("[a-z]+", ".", MLB$Names)
#[1] "D. Jeter" "R. Cano" "N. Markakis" "D. Ortiz"
Here is another option with strsplit
where we split by one or more lower case letters followed by one or more spaces ([a-z]+\\s+
), loop over the list
with vapply
and paste
the strings together.
vapply(strsplit(MLB$Names, "[a-z]+\\s+"), paste, collapse=". ", character(1))
#[1] "D. Jeter" "R. Cano" "N. Markakis" "D. Ortiz"
MLB <- data.frame(Names = c("Derek Jeter", "Robinson Cano",
"Nick Markakis", "David Ortiz"), stringsAsFactors=FALSE)
Upvotes: 1