Petr
Petr

Reputation: 1847

Dplyr tranformation based on string filtering and conditions

I would like to tranform messy dataset in R,

However I am having issues figuring out how to do so, I provided example dataset and result that I need to achieve:

dataset <- tribble(
  ~ID, ~DESC,
  1, "3+1Â 81Â mÂ", 
  2, "2+1Â 90Â mÂ",
  3, "3+KK 28Â mÂ",
  4, "3+1 120 m (Mezone)")
dataset

dataset_tranformed <- tribble(
  ~ID, ~Rooms, ~Meters, ~Mezone, ~KK,
  1, 4, 81,0, 0,
  2, 3, 90,0,0,
  3, 3, 28,0,1,
  4, 4, 120,1, 0)
dataset_tranformed

columns firstly need to be seperated, however using dataset %>% separate(DESC, c("size", "meters_squared", "Mezone"), sep = " ") does not work because (Mezone) is thrown away.

Upvotes: 1

Views: 43

Answers (1)

akrun
akrun

Reputation: 887691

We can do this by doing evaluation and individually extract the components

library(dplyr)
library(stringr)
library(tidyr)
dataset %>% 
   mutate(Rooms = map_dbl(DESC,  ~
       str_extract(.x, "^\\d+\\+\\d*") %>% 
         str_replace("\\+$", "+0") %>% 
         rlang::parse_expr(.) %>% 
         eval ), 
   Meters = str_extract(DESC, "(?<=\\s)\\d+(?=Â)"),
   Mezone = +(str_detect(DESC, "Mezone")),
   KK = +(str_detect(DESC, "KK"))) %>%
  select(-DESC)
# A tibble: 4 x 5
#     ID Rooms Meters Mezone    KK
#  <dbl> <dbl> <chr>   <int> <int>
#1     1     4 81          0     0
#2     2     3 90          0     0
#3     3     3 28          0     1
#4     4     4 120         1     0

Or another option is extract and then make use of str_detect

dataset %>% 
   extract(DESC, into = c("Rooms1", "Rooms2", "Meters"), 
     "^(\\d+)\\+(\\d*)[^0-9]+(\\d+)", convert = TRUE, remove = FALSE) %>%
   transmute(ID, Mezone = +(str_detect(DESC, "Mezone")),
        KK = +(is.na(Rooms2)), Rooms =  Rooms1 + replace_na(Rooms2, 0), Meters )
# A tibble: 4 x 5
#     ID Mezone    KK Rooms Meters
#  <dbl>  <int> <int> <dbl>  <int>
#1     1      0     0     4     81
#2     2      0     0     3     90
#3     3      0     1     3     28
#4     4      1     0     4    120

Upvotes: 2

Related Questions