Reputation: 773
I have a data frame with lot of company information separated by an id variable. I want to sort one of the variables and repeat it for every id. Let's take this example,
df <- structure(list(id = c(110, 110, 110, 90, 90, 90, 90, 252, 252
), var1 = c(26, 21, 54, 10, 18, 9, 16, 54, 39), var2 = c(234,
12, 43, 32, 21, 19, 16, 34, 44)), .Names = c("id", "var1", "var2"
), row.names = c(NA, -9L), class = "data.frame")
Which looks like this
df
id var1 var2
1 110 26 234
2 110 21 12
3 110 54 43
4 90 10 32
5 90 18 21
6 90 9 19
7 90 16 16
8 252 54 34
9 252 39 44
Now, I want to sort the data frame according to var1
by the vector id
. Easiest solution I can think of is using apply
function like this,
> apply(df, 2, sort)
id var1 var2
[1,] 90 9 12
[2,] 90 10 16
[3,] 90 16 19
[4,] 90 18 21
[5,] 110 21 32
[6,] 110 26 34
[7,] 110 39 43
[8,] 252 54 44
[9,] 252 54 234
However, this is not the output I am seeking. The correct output should be,
id var1 var2
1 110 21 12
2 110 26 234
3 110 54 43
4 90 9 19
5 90 10 32
6 90 16 16
7 90 18 21
8 252 39 44
9 252 54 34
Group by id
and sort by var1
column and keep original id
column order.
Any idea how to sort like this?
Upvotes: 5
Views: 3800
Reputation: 47320
In base R
we could use split<-
:
split(df,df$id) <- lapply(split(df,df$id), function(x) x[order(x$var1),] )
or as @Markus suggests :
split(df, df$id) <- by(df, df$id, function(x) x[order(x$var1),])
output in either case :
df
# id var1 var2
# 1 110 21 12
# 2 110 26 234
# 3 110 54 43
# 4 90 9 19
# 5 90 10 32
# 6 90 16 16
# 7 90 18 21
# 8 252 39 44
# 9 252 54 34
Upvotes: 3
Reputation: 20379
Note. As mentioned by Moody_Mudskipper, there is no need to use tidyverse
and can also be done easily with base R
:
df[order(ordered(df$id, unique(df$id)), df$var1), ]
A one-liner tidyverse
solution w/o any temp
vars:
library(tidyverse)
df %>% arrange(ordered(id, unique(id)), var1)
# id var1 var2
# 1 110 26 234
# 2 110 21 12
# 3 110 54 43
# 4 90 10 32
# 5 90 18 21
# 6 90 9 19
# 7 90 16 16
# 8 252 54 34
# 9 252 39 44
Explanation of why apply(df, 2, sort)
does not work
What you were trying to do is to sort each column independently. apply
runs over the specified dimension (2
in this case which corresponds to columns) and applies the function (sort
in this case).
apply
tries to further simplify the results, in this case to a matrix. So you are getting back a matrix (not a data.frame
) where each column is sorted independently. For example this row from the apply
call:
# [1,] 90 9 12
does not even exist in the original data.frame
.
Upvotes: 9
Reputation: 26343
Another base R option using order
and match
df[with(df, order(match(id, unique(id)), var1, var2)), ]
# id var1 var2
#2 110 21 12
#1 110 26 234
#3 110 54 43
#6 90 9 19
#4 90 10 32
#7 90 16 16
#5 90 18 21
#9 252 39 44
#8 252 54 34
Upvotes: 6
Reputation: 51592
We can convert the id
to factor in order to split while preserving the original order. We can then loop over the list and order, and rbind
again, i.e.
df$id <- factor(df$id, levels = unique(df$id))
do.call(rbind, lapply(split(df, df$id), function(i)i[order(i$var1),]))
# id var1 var2
#110.2 110 21 12
#110.1 110 26 234
#110.3 110 54 43
#90.6 90 9 19
#90.4 90 10 32
#90.7 90 16 16
#90.5 90 18 21
#252.9 252 39 44
#252.8 252 54 34
NOTE: You can reset the rownames by rownames(new_df) <- NULL
Upvotes: 3
Reputation: 76495
With the following tidyverse
pipe, the question's output is reproduced.
library(tidyverse)
df %>%
mutate(tmp = cumsum(c(0, diff(id) != 0))) %>%
group_by(id) %>%
arrange(tmp, var1) %>%
select(-tmp)
## A tibble: 9 x 3
## Groups: id [3]
# id var1 var2
# <dbl> <dbl> <dbl>
#1 110 21 12
#2 110 26 234
#3 110 54 43
#4 90 9 19
#5 90 10 32
#6 90 16 16
#7 90 18 21
#8 252 39 44
#9 252 54 34
Upvotes: 2