How to mutate values of a tibble in long format

Question

I want to normalize using min/max the values of two indicators. Is it possible to do it keeping the tibble in long format? (Below I use left join to do it in wide format).

library(tidyverse)

df <- tibble(ind =c(1, 2),
             `2015` = c(3,10),
             `2016` = c(7,18),
            `2017` = c(1,4))

# long format
df2 <- df %>%
    gather("year", "value", 2:4)

df3 <- df2 %>%
    group_by(ind) %>%
    summarise(mn = min(value),
              mx = max(value))

# wide format? 
df4 <- left_join(df2, df3, by = c("ind"="ind"))

df5 <- df4 %>%
  mutate(value2 = (value-mn)/(mx-mn))
Created on 2019-10-07 by the reprex package (v0.3.0)

akrun · Accepted Answer

Instead of doing the left_join, can create the columns with mutate and avoid the summarise step

library(dplyr)
df2 %>% 
    group_by(ind) %>%
    mutate(mn = min(value), mx = max(value)) %>%
    ungroup %>%
    mutate(value2 = (value - mn)/(mx-mn))

NOTE: Here, we assumed the OP wanted the columns 'mx', 'mn' in the final output. But, if the intention is to get only 'value2', there is no need for creating additional columns as @Gregor mentioned in the comments

df2 %>%
    group_by(ind) %>%
    mutate(value2 = (value - min(value))/(max(value) - min(value)))

Also, with the tidyr_1.0.0, instead of gather, can use pivot_longer which is more generalized as it can deal with multiple sets of columns to reshape from 'wide' to 'long'

library(tidyr)
df %>% 
   pivot_longer(cols = -ind) %>% 
   group_by(ind) %>% 
   mutate(mn = min(value), mx = max(value))  %>%
   ungroup %>%
   mutate(value2 = (value - mn)/(mx-mn))

How to mutate values of a tibble in long format

Answers (1)

Related Questions