Reputation: 241
I have this data set:
Year January February March April May June July August
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2018 45 51 63 61 79 85 88 85
2 2017 51 60 65 69 75 82 86 84
3 2016 47 55 61 68 72 84 87 85
... with 20 more rows
I would like to get the months corresponding to the min and max of each row, and the difference between the max and min. This is my code for the min and max,
x <- colnames(data)[apply(data[,c(2:9)],1,which.max)]
y <- colnames(data)[apply(data[,c(2:9)],1,which.min)]
data$MaxMonth <- x
data$MinMonth <- y
However, it is giving me Year as a output for some of the which.min function.
Year January February March April May June July August MaxMonth MinMonth Diff
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2018 45 51 63 61 79 85 88 85 July January 43
2 2017 51 60 65 69 75 82 86 84 July Year 35
3 2016 47 55 61 68 72 84 87 85 July Year 40
... with 20 more rows
Upvotes: 2
Views: 1570
Reputation: 78947
library(tidyverse)
df %>%
mutate(max_month = pmap(across(January:August), ~ names(c(...)[which.max(c(...))])),
min_month = pmap(across(January:August), ~ names(c(...)[which.min(c(...))]))
) %>%
unnest(cols = c(max_month, min_month)) %>%
rowwise() %>%
mutate(Diff = max(c_across(January:August)) - min(c_across(January:August)))
Output:
Year January February March April May June July August max_month min_month Diff
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl>
1 2018 45 51 63 61 79 85 88 85 July January 43
2 2017 51 60 65 69 75 82 86 84 July January 35
3 2016 47 55 61 68 72 84 87 85 July January 40
Upvotes: 1
Reputation: 887291
We can reshape to long format with pivot_longer
, do a group by 'Year', get the column name that correspond to max/min
of 'value' (with which.max/which.min
) and then join with the original data
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = -1) %>%
group_by(Year) %>%
summarise(maxMonth = name[which.max(value)],
minMonth = name[which.min(value)]) %>%
left_join(df, .)
Upvotes: 2
Reputation: 1202
I think the comment on your post highlights where the issue is
you should rather write
x <- colnames(data)[2:9][apply(data[,c(2:9)],1,which.max)]
y <- colnames(data)[2:9][apply(data[,c(2:9)],1,which.min)]
data$MaxMonth <- x
data$MinMonth <- y
Does it work better like this?
Upvotes: 2
Reputation: 79288
No need to do 3 apply functions. You could do:
nms <- names(df)[-1]
n <- seq(nrow(df))
maxMonth = max.col(df[-1])
minMonth = max.col(-df[-1])
diff <- df[-1][cbind(n, maxMonth)] - df[-1][cbind(n, minMonth)]
cbind(df, maxMonth = nms[maxMonth], minMonth = nms[minMonth], diff)
Year January February March April May June July August maxMonth minMonth diff
1 2018 45 51 63 61 79 85 88 85 July January 43
2 2017 51 60 65 69 75 82 86 84 July January 35
3 2016 47 55 61 68 72 84 87 85 July January 40
Upvotes: 2