Gjaldon
Gjaldon

Reputation: 5644

Custom function returns the same value for all rows in dplyr's mutate

I have the following data:

                                                 Name
1                             Braund, Mr. Owen Harris
2 Cumings, Mrs. John Bradley (Florence Briggs Thayer)
3                              Heikkinen, Miss. Laina
4        Futrelle, Mrs. Jacques Heath (Lily May Peel)
5                            Allen, Mr. William Henry

The data can be loaded like:

structure(list(Name = c("Braund, Mr. Owen Harris", "Cumings, Mrs. John Bradley (Florence Briggs Thayer)", 
"Heikkinen, Miss. Laina", "Futrelle, Mrs. Jacques Heath (Lily May Peel)", 
"Allen, Mr. William Henry")), .Names = "Name", row.names = c(NA, 
-5L), class = c("tbl_df", "tbl", "data.frame"))

My expected output is:

                                                 Name    Title
1                             Braund, Mr. Owen Harris       Mr
2 Cumings, Mrs. John Bradley (Florence Briggs Thayer)      Mrs
3                              Heikkinen, Miss. Laina      Mrs
4        Futrelle, Mrs. Jacques Heath (Lily May Peel)      Mrs
5                            Allen, Mr. William Henry       Mr

The problem is that below code would set all Titles to just "Mr". I'm using a custom function with dplyr's mutate.

library('stringr')
library('dplyr')

extractTitle <- function(name) {
  str_match(name, '(\\b[a-zA-z]+)\\.')[2]
}

data <- data %>% 
          mutate(Title = extractTitle(Name))

The weird thing is that if I change extractTitle to return the argument as is, it works as expected. For example:

extractTitle <- function(name) {
  name
}

data <- data %>% 
          mutate(Title = extractTitle(Name))

The above code will return:

                                                 Name    Title
1                             Braund, Mr. Owen Harris   Braund, Mr. Owen Harris
2 Cumings, Mrs. John Bradley (Florence Briggs Thayer)   Cumings, Mrs. John Bradley (Florence Briggs Thayer)
3                              Heikkinen, Miss. Laina   Heikkinen, Miss. Laina
4        Futrelle, Mrs. Jacques Heath (Lily May Peel)   Futrelle, Mrs. Jacques Heath (Lily May Peel)
5                            Allen, Mr. William Henry   Allen, Mr. William Henry

This is my expected behavior which is different from the behavior of the code I'm having trouble with.

Is there something I'm missing here or is this a bug?

P.S. - I'm using dplyr version 0.5.0

Upvotes: 3

Views: 1241

Answers (1)

Maiasaura
Maiasaura

Reputation: 32986

library(dplyr)
library(stringr)    
data %>%
      mutate(title = str_extract(string = Name, pattern = "(Mr|Miss|Mrs)\\.")) %>%
      select(Name, title)

which returns:

# A tibble: 6 x 2
                                                 Name title
                                                <chr> <chr>
1                             Braund, Mr. Owen Harris   Mr.
2 Cumings, Mrs. John Bradley (Florence Briggs Thayer)  Mrs.
3                              Heikkinen, Miss. Laina Miss.
4        Futrelle, Mrs. Jacques Heath (Lily May Peel)  Mrs.
5                            Allen, Mr. William Henry   Mr.
6                                    Moran, Mr. James   Mr.

Upvotes: 2

Related Questions