Jake Parker
Jake Parker

Reputation: 241

Column name with the min and max values in a dataset in R

I have this data set:

   Year  January February March April   May  June  July August 
   <chr>   <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>         
 1 2018     45        51    63    61    79    85    88     85         
 2 2017     51        60    65    69    75    82    86     84          
 3 2016     47        55    61    68    72    84    87     85        
... with 20 more rows     

I would like to get the months corresponding to the min and max of each row, and the difference between the max and min. This is my code for the min and max,

x <- colnames(data)[apply(data[,c(2:9)],1,which.max)]
y <- colnames(data)[apply(data[,c(2:9)],1,which.min)]
data$MaxMonth <- x
data$MinMonth <- y

However, it is giving me Year as a output for some of the which.min function.

   Year    January February March April May  June  July  August   MaxMonth  MinMonth    Diff
   <chr>   <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>          
 1 2018     45        51    63    61    79    85    88     85      July      January    43
 2 2017     51        60    65    69    75    82    86     84      July      Year       35
 3 2016     47        55    61    68    72    84    87     85      July      Year       40
... with 20 more rows 

Upvotes: 2

Views: 1570

Answers (4)

TarJae
TarJae

Reputation: 78947

library(tidyverse)
df %>% 
  mutate(max_month = pmap(across(January:August), ~ names(c(...)[which.max(c(...))])),
         min_month = pmap(across(January:August), ~ names(c(...)[which.min(c(...))]))
         ) %>% 
    unnest(cols = c(max_month, min_month)) %>%
  rowwise() %>% 
  mutate(Diff = max(c_across(January:August)) - min(c_across(January:August)))

Output:

   Year January February March April   May  June  July August max_month min_month  Diff
  <dbl>   <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl> <chr>     <chr>     <dbl>
1  2018      45       51    63    61    79    85    88     85 July      January      43
2  2017      51       60    65    69    75    82    86     84 July      January      35
3  2016      47       55    61    68    72    84    87     85 July      January      40

Upvotes: 1

akrun
akrun

Reputation: 887291

We can reshape to long format with pivot_longer, do a group by 'Year', get the column name that correspond to max/min of 'value' (with which.max/which.min) and then join with the original data

library(dplyr)
library(tidyr)
df %>% 
    pivot_longer(cols = -1) %>%
    group_by(Year) %>%
    summarise(maxMonth = name[which.max(value)],
           minMonth = name[which.min(value)]) %>%
    left_join(df, .)
 

Upvotes: 2

elielink
elielink

Reputation: 1202

I think the comment on your post highlights where the issue is

you should rather write

x <- colnames(data)[2:9][apply(data[,c(2:9)],1,which.max)]
y <- colnames(data)[2:9][apply(data[,c(2:9)],1,which.min)]
data$MaxMonth <- x
data$MinMonth <- y

Does it work better like this?

Upvotes: 2

Onyambu
Onyambu

Reputation: 79288

No need to do 3 apply functions. You could do:

nms <- names(df)[-1]
n <- seq(nrow(df))
maxMonth = max.col(df[-1])
minMonth = max.col(-df[-1]) 
diff <-  df[-1][cbind(n, maxMonth)] - df[-1][cbind(n, minMonth)]
cbind(df, maxMonth = nms[maxMonth], minMonth = nms[minMonth], diff)

  Year January February March April May June July August maxMonth minMonth diff
1 2018      45       51    63    61  79   85   88     85     July  January   43
2 2017      51       60    65    69  75   82   86     84     July  January   35
3 2016      47       55    61    68  72   84   87     85     July  January   40

Upvotes: 2

Related Questions