JAdel
JAdel

Reputation: 1616

How can I transform and aggregate numeric values in a character column?

I have the followed column structure:

[9] "€10-€20"           "€10-€60"           "€6-€12"            "€3-€10" 
[737] "CHF 11-CHF 36"     "CHF 3-CHF 10"      "CHF 4-CHF 9"       "CHF 6-CHF 25"

I want to convert the values in the column to a numeric value containing the average of the values:

[9] 15           35           9            6.5
[737] 23.5     6.5      6.5       15.5

Code for reproduction :

example <- c("€10-€20","€10-€60","€6-€12","€3-€10",
         "CHF 11-CHF 36","CHF 3-CHF 10","CHF 4-CHF 9","CHF 6-CHF 25")
dt <- data.table(example)

Upvotes: 6

Views: 179

Answers (2)

PaulS
PaulS

Reputation: 25528

With a tidyverse approach:

library(tidyverse)
library(data.table)

example <- c("€10-€20","€10-€60","€6-€12","€3-€10",
             "CHF 11-CHF 36","CHF 3-CHF 10","CHF 4-CHF 9","CHF 6-CHF 25")
dt <- data.table(example)

dt %>%
  separate(example, "-", into = str_c("value", 1:2), remove = FALSE) %>% 
  mutate(across(-1, ~ parse_number(.x))) %>% 
  mutate(value = rowMeans(.[,-1]), across(value1:value2, ~ NULL))

#>          example value
#> 1:       €10-€20  15.0
#> 2:       €10-€60  35.0
#> 3:        €6-€12   9.0
#> 4:        €3-€10   6.5
#> 5: CHF 11-CHF 36  23.5
#> 6:  CHF 3-CHF 10   6.5
#> 7:   CHF 4-CHF 9   6.5
#> 8:  CHF 6-CHF 25  15.5

Upvotes: 4

akrun
akrun

Reputation: 887951

We may remove the non-numeric part with gsub, read with read.table specifying the sep as - and use rowMeans in base R

rowMeans(read.table(text = gsub("[^0-9-]+", "", dt$example),
    sep = "-", header = FALSE), na.rm = TRUE)

Or using fread from data.table

library(data.table)
dt[, Mean := rowMeans(fread(text = gsub("[^0-9-]+", "", example), sep = "-"))]
> dt
         example Mean
1:       €10-€20 15.0
2:       €10-€60 35.0
3:        €6-€12  9.0
4:        €3-€10  6.5
5: CHF 11-CHF 36 23.5
6:  CHF 3-CHF 10  6.5
7:   CHF 4-CHF 9  6.5
8:  CHF 6-CHF 25 15.5

Upvotes: 5

Related Questions