Reputation: 5644
I have the following data:
Name
1 Braund, Mr. Owen Harris
2 Cumings, Mrs. John Bradley (Florence Briggs Thayer)
3 Heikkinen, Miss. Laina
4 Futrelle, Mrs. Jacques Heath (Lily May Peel)
5 Allen, Mr. William Henry
The data can be loaded like:
structure(list(Name = c("Braund, Mr. Owen Harris", "Cumings, Mrs. John Bradley (Florence Briggs Thayer)",
"Heikkinen, Miss. Laina", "Futrelle, Mrs. Jacques Heath (Lily May Peel)",
"Allen, Mr. William Henry")), .Names = "Name", row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
My expected output is:
Name Title
1 Braund, Mr. Owen Harris Mr
2 Cumings, Mrs. John Bradley (Florence Briggs Thayer) Mrs
3 Heikkinen, Miss. Laina Mrs
4 Futrelle, Mrs. Jacques Heath (Lily May Peel) Mrs
5 Allen, Mr. William Henry Mr
The problem is that below code would set all Title
s to just "Mr"
. I'm using a custom function with dplyr's mutate
.
library('stringr')
library('dplyr')
extractTitle <- function(name) {
str_match(name, '(\\b[a-zA-z]+)\\.')[2]
}
data <- data %>%
mutate(Title = extractTitle(Name))
The weird thing is that if I change extractTitle to return the argument as is, it works as expected. For example:
extractTitle <- function(name) {
name
}
data <- data %>%
mutate(Title = extractTitle(Name))
The above code will return:
Name Title
1 Braund, Mr. Owen Harris Braund, Mr. Owen Harris
2 Cumings, Mrs. John Bradley (Florence Briggs Thayer) Cumings, Mrs. John Bradley (Florence Briggs Thayer)
3 Heikkinen, Miss. Laina Heikkinen, Miss. Laina
4 Futrelle, Mrs. Jacques Heath (Lily May Peel) Futrelle, Mrs. Jacques Heath (Lily May Peel)
5 Allen, Mr. William Henry Allen, Mr. William Henry
This is my expected behavior which is different from the behavior of the code I'm having trouble with.
Is there something I'm missing here or is this a bug?
P.S. - I'm using dplyr version 0.5.0
Upvotes: 3
Views: 1241
Reputation: 32986
library(dplyr)
library(stringr)
data %>%
mutate(title = str_extract(string = Name, pattern = "(Mr|Miss|Mrs)\\.")) %>%
select(Name, title)
which returns:
# A tibble: 6 x 2
Name title
<chr> <chr>
1 Braund, Mr. Owen Harris Mr.
2 Cumings, Mrs. John Bradley (Florence Briggs Thayer) Mrs.
3 Heikkinen, Miss. Laina Miss.
4 Futrelle, Mrs. Jacques Heath (Lily May Peel) Mrs.
5 Allen, Mr. William Henry Mr.
6 Moran, Mr. James Mr.
Upvotes: 2