Reputation: 13090
Is there an easier way to ensure that a data frame's rows are ordered according to a "target" vector as the one I implemented in the short example below?
df <- data.frame(name = letters[1:4], value = c(rep(TRUE, 2), rep(FALSE, 2)))
df
# name value
# 1 a TRUE
# 2 b TRUE
# 3 c FALSE
# 4 d FALSE
target <- c("b", "c", "a", "d")
This somehow seems to be a bit too "complicated" to get the job done:
idx <- sapply(target, function(x) {
which(df$name == x)
})
df <- df[idx,]
rownames(df) <- NULL
df
# name value
# 1 b TRUE
# 2 c FALSE
# 3 a TRUE
# 4 d FALSE
Upvotes: 224
Views: 215838
Reputation: 137
Here's a similar system for the situation where you have a variable you want to sort by, initially, but then you want to sort by a secondary variable according to the order that this secondary variable first appears in the initial sort.
In the function below, the initial sort variable is called order_by
and the secondary variable is called order_along
- as in "order by this variable along its initial order".
library(dplyr, warn.conflicts = FALSE)
df <- structure(
list(
msoa11hclnm = c(
"Bewbush", "Tilgate", "Felpham",
"Selsey", "Brunswick", "Ratton", "Ore", "Polegate", "Mile Oak",
"Upperton", "Arundel", "Kemptown"
),
lad20nm = c(
"Crawley", "Crawley",
"Arun", "Chichester", "Brighton and Hove", "Eastbourne", "Hastings",
"Wealden", "Brighton and Hove", "Eastbourne", "Arun", "Brighton and Hove"
),
shape_area = c(
1328821, 3089180, 3540014, 9738033, 448888, 10152663, 5517102,
7036428, 5656430, 2653589, 72832514, 826151
)
),
row.names = c(NA, -12L), class = "data.frame"
)
this does not give me what I need:
df %>%
dplyr::arrange(shape_area, lad20nm)
#> msoa11hclnm lad20nm shape_area
#> 1 Brunswick Brighton and Hove 448888
#> 2 Kemptown Brighton and Hove 826151
#> 3 Bewbush Crawley 1328821
#> 4 Upperton Eastbourne 2653589
#> 5 Tilgate Crawley 3089180
#> 6 Felpham Arun 3540014
#> 7 Ore Hastings 5517102
#> 8 Mile Oak Brighton and Hove 5656430
#> 9 Polegate Wealden 7036428
#> 10 Selsey Chichester 9738033
#> 11 Ratton Eastbourne 10152663
#> 12 Arundel Arun 72832514
Here’s a function:
order_along <- function(df, order_along, order_by) {
cols <- colnames(df)
df <- df %>%
dplyr::arrange({{ order_by }})
df %>%
dplyr::select({{ order_along }}) %>%
dplyr::distinct() %>%
dplyr::full_join(df) %>%
dplyr::select(dplyr::all_of(cols))
}
order_along(df, lad20nm, shape_area)
#> Joining, by = "lad20nm"
#> msoa11hclnm lad20nm shape_area
#> 1 Brunswick Brighton and Hove 448888
#> 2 Kemptown Brighton and Hove 826151
#> 3 Mile Oak Brighton and Hove 5656430
#> 4 Bewbush Crawley 1328821
#> 5 Tilgate Crawley 3089180
#> 6 Upperton Eastbourne 2653589
#> 7 Ratton Eastbourne 10152663
#> 8 Felpham Arun 3540014
#> 9 Arundel Arun 72832514
#> 10 Ore Hastings 5517102
#> 11 Polegate Wealden 7036428
#> 12 Selsey Chichester 9738033
Created on 2021-01-12 by the reprex package (v0.3.0)
Upvotes: 2
Reputation: 547
If you don't want to use any libraries and you have reoccurrences in your data, you can use which
with sapply
as well.
new_order <- sapply(target, function(x,df){which(df$name == x)}, df=df)
df <- df[new_order,]
Upvotes: 4
Reputation: 389235
We can adjust the factor levels based on target
and use it in arrange
library(dplyr)
df %>% arrange(factor(name, levels = target))
# name value
#1 b TRUE
#2 c FALSE
#3 a TRUE
#4 d FALSE
Or order
it and use it in slice
df %>% slice(order(factor(name, levels = target)))
Upvotes: 63
Reputation: 538
I prefer to use ***_join
in dplyr
whenever I need to match data. One possible try for this
left_join(data.frame(name=target),df,by="name")
Note that the input for ***_join
require tbls or data.frame
Upvotes: 30
Reputation: 1383
This method is a bit different, it provided me with a bit more flexibility than the previous answer.
By making it into an ordered factor, you can use it nicely in arrange
and such. I used reorder.factor from the gdata
package.
df <- data.frame(name=letters[1:4], value=c(rep(TRUE, 2), rep(FALSE, 2)))
target <- c("b", "c", "a", "d")
require(gdata)
df$name <- reorder.factor(df$name, new.order=target)
Next, use the fact that it is now ordered:
require(dplyr)
df %>%
arrange(name)
name value
1 b TRUE
2 c FALSE
3 a TRUE
4 d FALSE
If you want to go back to the original (alphabetic) ordering, just use as.character()
to get it back to the original state.
Upvotes: 19
Reputation: 5537
Try match
:
df <- data.frame(name=letters[1:4], value=c(rep(TRUE, 2), rep(FALSE, 2)))
target <- c("b", "c", "a", "d")
df[match(target, df$name),]
name value
2 b TRUE
3 c FALSE
1 a TRUE
4 d FALSE
It will work as long as your target
contains exactly the same elements as df$name
, and neither contain duplicate values.
From ?match
:
match returns a vector of the positions of (first) matches of its first argument
in its second.
Therefore match
finds the row numbers that matches target
's elements, and then we return df
in that order.
Upvotes: 307