Tina
Tina

Reputation: 21

Extract multiple numeric value from a data frame column and store in separate columns

I have a data frame like this

df <- data.frame(Income = c("$100to$200","under$100","above$1000"))

I would like this as output

df_final <- data.frame(Avg = c(150,100,1000))

I would like to extract the numeric value from the income column, if there are two numbers, take the average, if there is only one number, take that number.

Upvotes: 2

Views: 145

Answers (4)

one
one

Reputation: 3902

df %>%
  transmute(
    Avg = stringr::str_extract_all(Income, "(?<=\\$)\\d+") %>%
      lapply(as.numeric) %>% 
      sapply(mean)
  )
   Avg
1  150
2  100
3 1000

Upvotes: 0

Andre Wildberg
Andre Wildberg

Reputation: 19088

A stringr approach using gsub to get the numerics, str_squish to remove the white space and str_split to get the entries in case of more then one value.

library(stringr)

data.frame(Avg = sapply(
  str_split(str_squish(gsub("[[:alpha:]$]", " ", df$Income)), " "), function(x) 
    sum(as.numeric(x)) / length(x)))
   Avg
1  150
2  100
3 1000

Upvotes: 0

Anoushiravan R
Anoushiravan R

Reputation: 21908

You could try:

library(dplyr)
library(stringr)

df %>% 
  mutate(across(Income, ~ sapply(str_extract_all(.x, '\\d+'), \(x) {strsplit(x, ',') |>
      as.numeric() |> mean()})))

  Income
1    150
2    100
3   1000

Upvotes: 2

Mako212
Mako212

Reputation: 7292

A few key steps here. First we need to clean our data, in this case getting rid of the $ makes thing easier. Then we'll split into a From and To column. Finally we need to convert to numeric and calculate the row means.

library(tidyverse)

df %>% 
  mutate(Income = gsub("$", "", Income, fixed = TRUE)) %>% 
  separate(Income, "to|under|above", into = c("From", "To")) %>% 
  mutate_all(.,as.numeric) %>% 
  mutate(Avg = rowMeans(.,na.rm =TRUE))

  From   To  Avg
1  100  200  150
2   NA  100  100
3   NA 1000 1000

Upvotes: 2

Related Questions