user7353167
user7353167

Reputation:

Summarise with dplyr - One variable always on bottom

Can anyone help me with this? I grouped and summarised spendings data from multiple companies, the output looks like this:

df <- data.frame(
    Column1 = c("Other", "Brand1", "Brand2", "Brand3", "Brand4", "Brand5"),
    Column2 = c(NA, "Subbrand1", "Subbrand2", "Subbrand3", "Subbrand4", "Subbrand5"),
    Spendings = c(1000, 500, 250, 200, 150, 100)
)

  Column1   Column2 Spendings
1   Other      <NA>      1000
2  Brand1 Subbrand1       500
3  Brand2 Subbrand2       250
4  Brand3 Subbrand3       200
5  Brand4 Subbrand4       150
6  Brand5 Subbrand5       100

The "others" row is on top, however I want that specific column on the bottom, because of later visualization (like here)

df <- data.frame(
    Column1 = c("Brand1", "Brand2", "Brand3", "Brand4", "Brand5", "Other"),
    Column2 = c("Subbrand1", "Subbrand2", "Subbrand3", "Subbrand4", "Subbrand5", NA),
    Spendings = c(500, 250, 200, 150, 100, 1000)
)

  Column1   Column2 Spendings
1  Brand1 Subbrand1       500
2  Brand2 Subbrand2       250
3  Brand3 Subbrand3       200
4  Brand4 Subbrand4       150
5  Brand5 Subbrand5       100
6   Other      <NA>      1000

This is the function I used to create the df with some desired code of me, which obv. does not work :-(.

df <- df%>%
    group_by(Column1, Column2) %>%
    summarise(Spendings = sum(Spendings)) %>%
    arrange(desc(Spendings), lastrow = "others")

Is there a way to get the "others" row on bottom inside the dplyr workflow? Subsetting and rbinding is of course possible, but is there a way which suits better?

Upvotes: 2

Views: 45

Answers (2)

akrun
akrun

Reputation: 887213

We can use a logical vector on arrange and this would result in ordering based on alphabetical order i.e. FALSE comes before TRUE

df %>% 
   arrange(Column1 == "Other")
#  Column1   Column2 Spendings
#1  Brand1 Subbrand1       500
#2  Brand2 Subbrand2       250
#3  Brand3 Subbrand3       200
#4  Brand4 Subbrand4       150
#5  Brand5 Subbrand5       100
#6   Other      <NA>      1000

Another option is to create the column as factor with levels specified in that order so that 'Other' is the last level and if we arrange it would be do the order based on the levels. It might be a better option as it would also be maintained while doing the plot

un1 <- c(setdiff(unique(df$Column1), "Other"), "Other")
df %>%
    mutate(Column1 = factor(Column1, levels = un1)) %>%
    arrange(Column1)

if we use the forcats package, there are some useful functions fct_relevel to modify the levels easily

library(forcats)
df %>% 
  mutate(Column1 = fct_relevel(Column1, "Other", after = Inf)) %>% 
  arrange(Column1)

According to the examples in ?fct_relevel

Using 'Inf' allows you to relevel to the end when the number of levels is unknown or variable (e.g. vectorised operations)

Upvotes: 2

iod
iod

Reputation: 7592

df <- df%>%
group_by(Column1, Column2) %>%
summarise(Spendings = sum(Spendings)) %>%
arrange(Column1=="Other", desc(Spendings))

Upvotes: 0

Related Questions