How can I apply a string function to a column?

Question

I have some data shown below

        date         over     bed.bath
1 2016-03-17 -0.002352941 1 bed 1 bath
2 2016-03-17 -0.035294118 1 bed 1 bath
3 2016-03-17 -0.008278717 1 bed 1 bath
4 2016-03-17 -0.008350731 1 bed 1 bath
5 2016-03-17  0.004243281 1 bed 2 bath
6 2016-03-17  0.007299270 2 bed 2 bat

The bed.bath column is a character. I'd like to extract the information about bed and bath separately. I've tried splitting the string and extracting out the numbers like so

getbeds <- function(x){

  splits = strsplit(x," ")

  return(splits[[1]][1])
}

However, when I use df<- df%>% mutate(beds = getbeds(bed.bath)), the new column is just 1s.

        date         over     bed.bath beds
1 2016-03-17 -0.002352941 1 bed 1 bath    1
2 2016-03-17 -0.035294118 1 bed 1 bath    1
3 2016-03-17 -0.008278717 1 bed 1 bath    1
4 2016-03-17 -0.008350731 1 bed 1 bath    1
5 2016-03-17  0.004243281 1 bed 2 bath    1
6 2016-03-17  0.007299270 2 bed 2 bath    1

What is the best way to extract the information I like from my data frame?

Data

df <- structure(list(date = structure(c(16877, 16877, 16877, 16877, 16877, 16877), class = "Date"),
                     over = c(-0.002352941, -0.035294118, -0.008278717, -0.008350731, 0.004243281, 0.00729927),
                     bed.bath = c("1 bed 1 bath", "1 bed 1 bath", "1 bed 1 bath", "1 bed 1 bath", "1 bed 2 bath", "2 bed 2 bath")),
                .Names = c("date", "over", "bed.bath"),
                row.names = c("1", "2", "3", "4", "5", "6"), class = "data.frame")

library('dplyr')
df %>% mutate(beds = getbeds(bed.bath))

akrun · Accepted Answer

We can use extract from tidyr

library(tidyr)
library(dplyr)
df %>% 
   extract(bed.bath, into = 'beds', "(\d+).*", remove = FALSE)

Or with base R using sub to match one or more spaces (\s+) followed by characters (.*) and replace it with blanks so that we get the numbers at the start of the string and all other characters are removed.

df$beds <- with(df, as.integer(sub("\s+.*", "", bed.bath)))

The reason for the same value in OP's output is because it is extracting only the first observation ([1]) from the first list element ([[1]])

How can I apply a string function to a column?

Answers (2)

Related Questions