EnlightenedFunky
EnlightenedFunky

Reputation: 325

Better way to do this in R

I have been given the following table from a data set. I have to process the dataset this is the code I have been using thus far:

X_1 X_2
<chr> <chr>
16% (4/25) 32% (8/25)
16% (4/25) 32% (8/25)
16% (4/25) 32% (8/25)
16% (4/25) 32% (8/25)
16% (4/25) 32% (8/25)
library(tidyverse)
names(age) <- c("Age18.25","Age26.35","Age36.45","Age46.55","Age56.65","Agegt65")
age <- age %>%
  dplyr::select(names(age)) %>% 
  dplyr::mutate( Age18.25 = sub('\\%.*', '', Age18.25),
    Age26.35 = sub('\\%.*', '', Age26.35),
    Age36.45 = sub('\\%.*', '', Age36.45),
    Age46.55 = sub('\\%.*', '', Age46.55),
    Age56.65 = sub('\\%.*', '', Age56.65),
    Agegt65 = sub('\\%.*', '', Agegt65))
age[] <- lapply(age, function(x) as.numeric(x))
head(age)

Is there a better way to do make this possible to do for the other dataframes that I have to do the same for? All the data frames have the same makeup I just want to extract percentages, however, columns vary and columns name have been giving an issue when I do this way making me rename them.

THis is the output.

X_1 X_2
<dbl> <dbl>
16 32
16 32

Here is the dput head age:

structure(list(Age18.25 = c(11, 9, 40, 41, 19, 17), Age26.35 = c(18, 
20, 23, 26, 30, 23), Age36.45 = c(18, 28, 17, 19, 12, 22), Age46.55 = c(14, 
15, 7, 15, 14, 23), Age56.65 = c(14, 8, 13, 0, 14, 13), Agegt65 = c(25, 
20, 0, 0, 12, 1)), row.names = c(NA, 6L), class = "data.frame")

Upvotes: 3

Views: 118

Answers (3)

TarJae
TarJae

Reputation: 78927

As akrun provided already best answer. I would like to add sub Update: removed escape. Thanks to Chris Ruehlemann!

library(dplyr)
age %>%
  mutate(across(everything(), ~ as.numeric(sub("%.*", "",.))))

Output:

    X_1   X_2
  <dbl> <dbl>
1    16    32
2    16    32
3    16    32
4    16    32
5    16    32

Upvotes: 3

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21400

A one-liner base R solution:

sapply(df, function(x) as.numeric(sub("%.*", "", x)))
        X1      X2
[1,] 45.00 566.000
[2,] 12.33   0.009
[3,]  1.00  33.000

Data:

df <- data.frame(
  X1 = c("45% (4/25)", "12.33%", "1"),
  X2 = c("566", "0.009% (8/66)", "33%")
)

Upvotes: 3

akrun
akrun

Reputation: 887088

We can loop across all the columns, remove the substring with str_remove and convert to numeric

library(dplyr)
library(stringr)
age <- age %>%
   mutate(across(everything(), ~ as.numeric(str_remove(., '%.*')))) 

Or another option is parse_number from readr

age %>%
   mutate(across(everything(), ~ readr::parse_number(.)))

Upvotes: 3

Related Questions