Reputation: 325
I have been given the following table from a data set. I have to process the dataset this is the code I have been using thus far:
X_1 | X_2 |
---|---|
<chr> | <chr> |
16% (4/25) | 32% (8/25) |
16% (4/25) | 32% (8/25) |
16% (4/25) | 32% (8/25) |
16% (4/25) | 32% (8/25) |
16% (4/25) | 32% (8/25) |
library(tidyverse)
names(age) <- c("Age18.25","Age26.35","Age36.45","Age46.55","Age56.65","Agegt65")
age <- age %>%
dplyr::select(names(age)) %>%
dplyr::mutate( Age18.25 = sub('\\%.*', '', Age18.25),
Age26.35 = sub('\\%.*', '', Age26.35),
Age36.45 = sub('\\%.*', '', Age36.45),
Age46.55 = sub('\\%.*', '', Age46.55),
Age56.65 = sub('\\%.*', '', Age56.65),
Agegt65 = sub('\\%.*', '', Agegt65))
age[] <- lapply(age, function(x) as.numeric(x))
head(age)
Is there a better way to do make this possible to do for the other dataframes that I have to do the same for? All the data frames have the same makeup I just want to extract percentages, however, columns vary and columns name have been giving an issue when I do this way making me rename them.
THis is the output.
X_1 | X_2 |
---|---|
<dbl> | <dbl> |
16 | 32 |
16 | 32 |
Here is the dput head age:
structure(list(Age18.25 = c(11, 9, 40, 41, 19, 17), Age26.35 = c(18,
20, 23, 26, 30, 23), Age36.45 = c(18, 28, 17, 19, 12, 22), Age46.55 = c(14,
15, 7, 15, 14, 23), Age56.65 = c(14, 8, 13, 0, 14, 13), Agegt65 = c(25,
20, 0, 0, 12, 1)), row.names = c(NA, 6L), class = "data.frame")
Upvotes: 3
Views: 118
Reputation: 78927
As akrun provided already best answer. I would like to add sub
Update: removed escape. Thanks to Chris Ruehlemann!
library(dplyr)
age %>%
mutate(across(everything(), ~ as.numeric(sub("%.*", "",.))))
Output:
X_1 X_2
<dbl> <dbl>
1 16 32
2 16 32
3 16 32
4 16 32
5 16 32
Upvotes: 3
Reputation: 21400
A one-liner base R
solution:
sapply(df, function(x) as.numeric(sub("%.*", "", x)))
X1 X2
[1,] 45.00 566.000
[2,] 12.33 0.009
[3,] 1.00 33.000
Data:
df <- data.frame(
X1 = c("45% (4/25)", "12.33%", "1"),
X2 = c("566", "0.009% (8/66)", "33%")
)
Upvotes: 3
Reputation: 887088
We can loop across
all the columns, remove the substring with str_remove
and convert to numeric
library(dplyr)
library(stringr)
age <- age %>%
mutate(across(everything(), ~ as.numeric(str_remove(., '%.*'))))
Or another option is parse_number
from readr
age %>%
mutate(across(everything(), ~ readr::parse_number(.)))
Upvotes: 3