questionmark
questionmark

Reputation: 345

R/dplyr: How to only keep integers in a data frame?

I have a data frame that has years in it (data type chr):

Years:
5 yrs
10 yrs
20 yrs
4 yrs

I want to keep only the integers to get a data frame like this (data type num):

Years:
5
10
20
4

How do I do this in R?

Upvotes: 4

Views: 695

Answers (3)

hello_friend
hello_friend

Reputation: 5788

Base R solution:

clean_years <- as.numeric(gsub("\\D", "", Years))

Data:

Years <- c("5 yrs",
               "10 yrs",
               "20 yrs",
               "4 yrs",
               "5 yrs")

Upvotes: 1

Chuck P
Chuck P

Reputation: 3923

Per your additional requirements a more general purpose solution but it has limits too. The nice thing about the more complicated years3 solution is it deals more gracefully with unexpected but quite possible answers.

library(dplyr)
library(stringr)
library(purrr)

Years <- c("5 yrs",
           "10 yrs",
           "20 yrs",
           "4 yrs",
           "4-5 yrs",
           "75 to 100 YEARS old",
           ">1 yearsmispelled or whatever")
df <- data.frame(Years)

# just the numbers but loses the -5 in 4-5
df$Years1 <- as.numeric(sub("(\\d{1,4}).*", "\\1", df$Years)) 
#> Warning: NAs introduced by coercion

# just the numbers but loses the -5 in 4-5 using str_extract
df$Years2 <- str_extract(df$Years, "[0-9]+")

# a lot more needed to account for averaging

df$Years3 <- str_extract_all(df$Years, "[0-9]+") %>%
  purrr::map( ~ ifelse(length(.x) == 1, 
                as.numeric(.x), 
                mean(unlist(as.numeric(.x)))))

df
#>                           Years Years1 Years2 Years3
#> 1                         5 yrs      5      5      5
#> 2                        10 yrs     10     10     10
#> 3                        20 yrs     20     20     20
#> 4                         4 yrs      4      4      4
#> 5                       4-5 yrs      4      4    4.5
#> 6           75 to 100 YEARS old     75     75   87.5
#> 7 >1 yearsmispelled or whatever     NA      1      1

Upvotes: 1

Daniel O
Daniel O

Reputation: 4358

you need to extract the numbers and treat them as type numeric

df$year <- as.numeric(sub(" yrs", "", df$year))

Upvotes: 4

Related Questions