Reputation: 21
I have a data frame like this
df <- data.frame(Income = c("$100to$200","under$100","above$1000"))
I would like this as output
df_final <- data.frame(Avg = c(150,100,1000))
I would like to extract the numeric value from the income column, if there are two numbers, take the average, if there is only one number, take that number.
Upvotes: 2
Views: 145
Reputation: 3902
df %>%
transmute(
Avg = stringr::str_extract_all(Income, "(?<=\\$)\\d+") %>%
lapply(as.numeric) %>%
sapply(mean)
)
Avg
1 150
2 100
3 1000
Upvotes: 0
Reputation: 19088
A stringr
approach using gsub
to get the numerics, str_squish
to remove the white space and str_split
to get the entries in case of more then one value.
library(stringr)
data.frame(Avg = sapply(
str_split(str_squish(gsub("[[:alpha:]$]", " ", df$Income)), " "), function(x)
sum(as.numeric(x)) / length(x)))
Avg
1 150
2 100
3 1000
Upvotes: 0
Reputation: 21908
You could try:
library(dplyr)
library(stringr)
df %>%
mutate(across(Income, ~ sapply(str_extract_all(.x, '\\d+'), \(x) {strsplit(x, ',') |>
as.numeric() |> mean()})))
Income
1 150
2 100
3 1000
Upvotes: 2
Reputation: 7292
A few key steps here. First we need to clean our data, in this case getting rid of the $
makes thing easier. Then we'll split into a From and To column. Finally we need to convert to numeric and calculate the row means.
library(tidyverse)
df %>%
mutate(Income = gsub("$", "", Income, fixed = TRUE)) %>%
separate(Income, "to|under|above", into = c("From", "To")) %>%
mutate_all(.,as.numeric) %>%
mutate(Avg = rowMeans(.,na.rm =TRUE))
From To Avg
1 100 200 150
2 NA 100 100
3 NA 1000 1000
Upvotes: 2