Reputation: 27
If I had a data.frame with two columns, name and age, what function would I use to go through the age column and select only the first number and have it put in a new column.
name age
Jack 43 - 44 Years
Jill 37 - 38 Years
Mike 17 - 19 Years
Jan 21 - 22 Years
Steve 25 - 30 Years
I don't even know how to look this up because I don't know what it is called. I have done multiple searches to no avail. so sorry it may seem like I am being lazy but I am just very new to R and programming in general. Thank you for your time.
Upvotes: 2
Views: 91
Reputation: 1855
Tidy & clean way to do it is using separate()
library(tidytext)
df %>% separate(age, c("New_Column"), remove = F, extra = "drop", sep = " ")
Upvotes: 2
Reputation: 887951
We can use parse_number
from readr
library(readr)
library(dplyr)
d %>%
mutate(age_low = parse_number(age))
-output
# name age age_low
#1 Jack, 43 - 44 Years 43
#2 Jill, 37 - 38 Years 37
#3 Mike, 17 - 19 Years 17
#4 Jan, 21 - 22 Years 21
#5 Steve, 25 - 30 Years 25
Another option is extract
to split the column into two
library(tidyr)
extract(d, age, into = c('age_low', 'age_high'), "(\\d+)\\D+(\\d+).*")
d <- structure(list(name = c("Jack,", "Jill,", "Mike,", "Jan,", "Steve,"
), age = c("43 - 44 Years", "37 - 38 Years", "17 - 19 Years",
"21 - 22 Years", "25 - 30 Years")), class = "data.frame",
row.names = c(NA,
-5L))
Upvotes: 3
Reputation: 125797
Using stringr::str_extract
this could be achieved like so:
Using the pattern "^\\d+"
will extract one or more digits (\\d+
) at the beginning ("^"
) of a string.
d <- read.table(text = "name age
Jack, '43 - 44 Years'
Jill, '37 - 38 Years'
Mike, '17 - 19 Years'
Jan, '21 - 22 Years'
Steve, '25 - 30 Years'", header = TRUE)
d$age_low <- stringr::str_extract(d$age, "^\\d+")
d
#> name age age_low
#> 1 Jack, 43 - 44 Years 43
#> 2 Jill, 37 - 38 Years 37
#> 3 Mike, 17 - 19 Years 17
#> 4 Jan, 21 - 22 Years 21
#> 5 Steve, 25 - 30 Years 25
Upvotes: 2
Reputation: 11548
In case some of the age groups have single digit in the lower range, this will work:
gsub('(\\d\\d?)(.*)','\\1',df$age)
Data used:
> ar
# A tibble: 5 x 2
name age
<chr> <chr>
1 Jack 43 - 44 Years
2 Jill 37 - 38 Years
3 Mike 17 - 19 Years
4 Jan 21 - 22 Years
5 Steve 25 - 30 Years
> ar$age1 <- gsub('(\\d\\d?)(.*)','\\1',ar$age)
> ar
# A tibble: 5 x 3
name age age1
<chr> <chr> <chr>
1 Jack 43 - 44 Years 43
2 Jill 37 - 38 Years 37
3 Mike 17 - 19 Years 17
4 Jan 21 - 22 Years 21
5 Steve 25 - 30 Years 25
>
Upvotes: 3