Alexander Shemetev
Alexander Shemetev

Reputation: 75

Change metrics inside rows by condition

Suppose a data:

  df1 <- tibble::tribble(~"M1", ~"M2",  ~"Beer, pints", ~"Coffee, oz",  ~"Gasoline, galons",    ~"Milk, galons",    ~"Warehouse, square feet",  ~"Nearest place, miles",    
    "NY",   "22",   "10",   "12",   "15",   "100",  "100",  "20",
    "NY",   "20",   "9",    "10",   "12",   "100",  "100",  "20",
    "NY",   "18",   "8",    "9",    "11",   "100",  "100",  "20",
    "M1",   "M2",   "Beer, liters", "Coffee, cups (120 ml)",    "Gasoline, liters", "Milk, liters", "Warehouse, square meters", "Nearest place, kilometers",
    "PR",   "22",   "7",    "8",    "9",    "70",   "67",   "7",
    "PR",   "20",   "6",    "7",    "8",    "80",   "75",   "7",
    "M1",   "M2",   "Beer, pints",  "Coffee, oz",   "Gasoline, liters", "Milk, liters", "Warehouse, square feet",   "Nearest place, miles",
    "KR",   "22",   "6",    "6",    "7",    "60",   "50",   "9",
    "KR",   "20",   "5",    "6",    "8",    "55",   "65",   "9",
    "KR",   "18",   "5",    "6",    "8",    "50",   "55",   "9")

For visual representation: enter image description here

Is there a nice method to recalculate all columns in the same metrics (like if it is liters, then the entrire column should be liters; if miles (not kilometers), then the entire column to be miles [based on condition in the subheadings inside]? It could be great to think on the nicest methods to solve it.

PS: for information:

1 gallon = 3.78541 liters

1 pint = 0.473176 liters

1 oz = 0.0295735 liters

11 square feet = 1.02193 square meters

1 mile = 1.60934 kilometers

I am just wondering and just started to consider for solution. I am interested to look for possible nice solutions. In addition, it will be interesting for the entire R community to think on the best methods to edit the data by condition.

Upvotes: 2

Views: 58

Answers (1)

Vin&#237;cius F&#233;lix
Vin&#237;cius F&#233;lix

Reputation: 8826

When the data is sloppy, we must also get our hands dirty.I thought of way, with many steps.

Data

df1 <-
structure(list(m1 = c("M1", "NY", "NY", "NY", "M1", "PR", "PR", 
"M1", "KR", "KR", "KR"), m2 = c("M2", "22", "20", "18", "M2", 
"22", "20", "M2", "22", "20", "18"), beer = c("Beer, pints", 
"10", "9", "8", "Beer, liters", "7", "6", "Beer, pints", "6", 
"5", "5"), coffee = c("Coffee, oz", "12", "10", "9", "Coffee, cups (120 ml)", 
"8", "7", "Coffee, oz", "6", "6", "6"), gasoline = c("Gasoline, galons", 
"15", "12", "11", "Gasoline, liters", "9", "8", "Gasoline, liters", 
"7", "8", "8"), milk = c("Milk, galons", "100", "100", "100", 
"Milk, liters", "70", "80", "Milk, liters", "60", "55", "50"), 
    warehouse = c("Warehouse, square feet", "100", "100", "100", 
    "Warehouse, square meters", "67", "75", "Warehouse, square feet", 
    "50", "65", "55"), nearest_place = c("Nearest_place, miles", 
    "20", "20", "20", "Nearest place, kilometers", "7", "7", 
    "Nearest place, miles", "9", "9", "9")), row.names = c(NA, 
-11L), class = c("tbl_df", "tbl", "data.frame"))

Convert function

convert_unit <- function(value,unit){
  
  m <- 
  case_when(
    unit == "galons" ~ 3.78541,
    unit == "pints" ~ 0.473176,
    unit == "oz" ~ 0.0295735,
    unit == "squarefeet" ~ 1.02193/11,
    unit == "miles" ~ 1.02193/11,
    TRUE ~ 1
    )

  output <- m*as.numeric(value)
  
  return(output)
}

Data preparation

First, I would add the header as the first row and also create better names.

library(dplyr)
library(stringr)
library(tidyr)
#remotes::install_github("vbfelix/relper")
library(relper)


or_names <- names(df1)
new_names <- str_to_lower(str_select(or_names,before = ","))
n_row <- nrow(df1)

df1[2:(n_row+1),] <- df1
df1[1,] <- as.list(or_names)
names(df1) <- new_names

Data manipulation

Then, I would create new columns with the units, and the apply the function to each one.

df1 %>% 
  mutate(
    across(.cols = -c(m1:m2),.fns = ~str_keep(str_select(.,after = ",")),.names = "{.col}_unit"),
    aux = beer_unit == "",
    across(.cols = ends_with("_unit"),~if_else(. == "",NA_character_,.))) %>% 
  fill(ends_with("_unit"),.direction = "down") %>%
  filter(aux) %>% 
  mutate(
    across(
      .cols = beer:nearest_place,
      .fns = ~convert_unit(value = .,unit = get(str_c(cur_column(),"_unit")))
      )
  ) %>% 
  select(-aux,-ends_with("_unit"))

Output

# A tibble: 8 x 8
  m1    m2     beer coffee gasoline  milk warehouse nearest_place
  <chr> <chr> <dbl>  <dbl>    <dbl> <dbl>     <dbl>         <dbl>
1 NY    22     4.73  0.355     56.8  379.      9.29         1.86 
2 NY    20     4.26  0.296     45.4  379.      9.29         1.86 
3 NY    18     3.79  0.266     41.6  379.      9.29         1.86 
4 PR    22     7     8          9     70      67            7    
5 PR    20     6     7          8     80      75            7    
6 KR    22     2.84  0.177      7     60       4.65         0.836
7 KR    20     2.37  0.177      8     55       6.04         0.836
8 KR    18     2.37  0.177      8     50       5.11         0.836

Upvotes: 1

Related Questions