Reputation: 27

range selection in R

If I had a data.frame with two columns, name and age, what function would I use to go through the age column and select only the first number and have it put in a new column.

name    age
Jack    43 - 44 Years
Jill    37 - 38 Years
Mike    17 - 19 Years
Jan     21 - 22 Years
Steve   25 - 30 Years

I don't even know how to look this up because I don't know what it is called. I have done multiple searches to no avail. so sorry it may seem like I am being lazy but I am just very new to R and programming in general. Thank you for your time.

Upvotes: 2

Answers (4)

Aman J

Reputation: 1855

Tidy & clean way to do it is using separate()

library(tidytext)
df %>% separate(age, c("New_Column"), remove = F, extra = "drop", sep = " ")

Upvotes: 2

akrun

Reputation: 887951

We can use parse_number from readr

library(readr)
library(dplyr)
d %>% 
    mutate(age_low = parse_number(age))

-output

#   name           age age_low
#1  Jack, 43 - 44 Years      43
#2  Jill, 37 - 38 Years      37
#3  Mike, 17 - 19 Years      17
#4   Jan, 21 - 22 Years      21
#5 Steve, 25 - 30 Years      25

Another option is extract to split the column into two

library(tidyr)
extract(d, age, into = c('age_low', 'age_high'), "(\\d+)\\D+(\\d+).*")

data

d <- structure(list(name = c("Jack,", "Jill,", "Mike,", "Jan,", "Steve,"
), age = c("43 - 44 Years", "37 - 38 Years", "17 - 19 Years", 
"21 - 22 Years", "25 - 30 Years")), class = "data.frame", 
row.names = c(NA, 
-5L))

Upvotes: 3

stefan

Reputation: 125797

Using stringr::str_extract this could be achieved like so:

Using the pattern "^\\d+" will extract one or more digits (\\d+) at the beginning ("^") of a string.

d <- read.table(text = "name age
Jack, '43 - 44 Years'
Jill, '37 - 38 Years'
Mike, '17 - 19 Years'
Jan, '21 - 22 Years'
Steve, '25 - 30 Years'", header = TRUE)


d$age_low <- stringr::str_extract(d$age, "^\\d+")
d
#>     name           age age_low
#> 1  Jack, 43 - 44 Years      43
#> 2  Jill, 37 - 38 Years      37
#> 3  Mike, 17 - 19 Years      17
#> 4   Jan, 21 - 22 Years      21
#> 5 Steve, 25 - 30 Years      25

Upvotes: 2

Karthik S

Reputation: 11548

In case some of the age groups have single digit in the lower range, this will work:

gsub('(\\d\\d?)(.*)','\\1',df$age)

Data used:

> ar
# A tibble: 5 x 2
  name  age          
  <chr> <chr>        
1 Jack  43 - 44 Years
2 Jill  37 - 38 Years
3 Mike  17 - 19 Years
4 Jan   21 - 22 Years
5 Steve 25 - 30 Years
> ar$age1 <- gsub('(\\d\\d?)(.*)','\\1',ar$age)
> ar
# A tibble: 5 x 3
  name  age           age1 
  <chr> <chr>         <chr>
1 Jack  43 - 44 Years 43   
2 Jill  37 - 38 Years 37   
3 Mike  17 - 19 Years 17   
4 Jan   21 - 22 Years 21   
5 Steve 25 - 30 Years 25   
>

Upvotes: 3

range selection in R

Answers (4)

data

Related Questions