small_lebowski
small_lebowski

Reputation: 773

Sorting one variable in a data frame by id

I have a data frame with lot of company information separated by an id variable. I want to sort one of the variables and repeat it for every id. Let's take this example,

df <- structure(list(id = c(110, 110, 110, 90, 90, 90, 90, 252, 252
), var1 = c(26, 21, 54, 10, 18, 9, 16, 54, 39), var2 = c(234, 
12, 43, 32, 21, 19, 16, 34, 44)), .Names = c("id", "var1", "var2"
), row.names = c(NA, -9L), class = "data.frame")

Which looks like this

df
   id var1 var2
1 110   26  234
2 110   21   12
3 110   54   43
4  90   10   32
5  90   18   21
6  90    9   19
7  90   16   16
8 252   54   34
9 252   39   44

Now, I want to sort the data frame according to var1 by the vector id. Easiest solution I can think of is using apply function like this,

> apply(df, 2, sort)
       id var1 var2
 [1,]  90    9   12
 [2,]  90   10   16
 [3,]  90   16   19
 [4,]  90   18   21
 [5,] 110   21   32
 [6,] 110   26   34
 [7,] 110   39   43
 [8,] 252   54   44
 [9,] 252   54  234

However, this is not the output I am seeking. The correct output should be,

   id var1 var2
1 110   21   12
2 110   26  234
3 110   54   43
4  90    9   19
5  90   10   32
6  90   16   16
7  90   18   21
8 252   39   44
9 252   54   34

Group by id and sort by var1 column and keep original id column order.

Any idea how to sort like this?

Upvotes: 5

Views: 3800

Answers (5)

moodymudskipper
moodymudskipper

Reputation: 47320

In base R we could use split<- :

split(df,df$id) <- lapply(split(df,df$id), function(x) x[order(x$var1),] )

or as @Markus suggests :

split(df, df$id) <- by(df, df$id, function(x) x[order(x$var1),])

output in either case :

df
#    id var1 var2
# 1 110   21   12
# 2 110   26  234
# 3 110   54   43
# 4  90    9   19
# 5  90   10   32
# 6  90   16   16
# 7  90   18   21
# 8 252   39   44
# 9 252   54   34

Upvotes: 3

thothal
thothal

Reputation: 20379

Note. As mentioned by Moody_Mudskipper, there is no need to use tidyverse and can also be done easily with base R:

df[order(ordered(df$id, unique(df$id)), df$var1), ]

A one-liner tidyverse solution w/o any temp vars:

library(tidyverse)
df %>% arrange(ordered(id, unique(id)), var1)
#    id var1 var2
# 1 110   26  234
# 2 110   21   12
# 3 110   54   43
# 4  90   10   32
# 5  90   18   21
# 6  90    9   19
# 7  90   16   16
# 8 252   54   34
# 9 252   39   44

Explanation of why apply(df, 2, sort) does not work

What you were trying to do is to sort each column independently. apply runs over the specified dimension (2 in this case which corresponds to columns) and applies the function (sort in this case).

apply tries to further simplify the results, in this case to a matrix. So you are getting back a matrix (not a data.frame) where each column is sorted independently. For example this row from the apply call:

# [1,]  90    9   12

does not even exist in the original data.frame.

Upvotes: 9

markus
markus

Reputation: 26343

Another base R option using order and match

df[with(df, order(match(id, unique(id)), var1, var2)), ]
#   id var1 var2
#2 110   21   12
#1 110   26  234
#3 110   54   43
#6  90    9   19
#4  90   10   32
#7  90   16   16
#5  90   18   21
#9 252   39   44
#8 252   54   34

Upvotes: 6

Sotos
Sotos

Reputation: 51592

We can convert the id to factor in order to split while preserving the original order. We can then loop over the list and order, and rbind again, i.e.

df$id <- factor(df$id, levels = unique(df$id))
do.call(rbind, lapply(split(df, df$id), function(i)i[order(i$var1),]))

#       id var1 var2
#110.2 110   21   12
#110.1 110   26  234
#110.3 110   54   43
#90.6   90    9   19
#90.4   90   10   32
#90.7   90   16   16
#90.5   90   18   21
#252.9 252   39   44
#252.8 252   54   34

NOTE: You can reset the rownames by rownames(new_df) <- NULL

Upvotes: 3

Rui Barradas
Rui Barradas

Reputation: 76495

With the following tidyverse pipe, the question's output is reproduced.

library(tidyverse)

df %>%
  mutate(tmp = cumsum(c(0, diff(id) != 0))) %>%
  group_by(id) %>%
  arrange(tmp, var1) %>%
  select(-tmp)
## A tibble: 9 x 3
## Groups:   id [3]
#     id  var1  var2
#  <dbl> <dbl> <dbl>
#1   110    21    12
#2   110    26   234
#3   110    54    43
#4    90     9    19
#5    90    10    32
#6    90    16    16
#7    90    18    21
#8   252    39    44
#9   252    54    34

Upvotes: 2

Related Questions