Reputation: 59
'order' in R seems like 'sort' in Stata. Here's a dataset for example (only variable names listed):
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15 v16 v17 v18
and here's the output I expect:
v1 v2 v3 v4 v5 v7 v8 v9 v10 v11 v12 v17 v18 v13 v14 v15 v6 v16
In R, I have 2 ways:
data <- data[,c(1:5,7:12,17:18,13:15,6,16)]
OR
names <- c("v1", "v2", "v3", "v4", "v5", "v7", "v8", "v9", "v10", "v11", "v12", "v17", "v18", "v13", "v14", "v15", "v6", "v16")
data <- data[names]
To get the same output in Stata, I may run 2 lines:
order v17 v18, before(v13)
order v6 v16, last
In the ideal data above, we can know the positions of the variables we want to deal with. But in most real cases, we have variables like 'age' 'gender' with no position indicators and we may have more than 50 variables in one dataset. Then the advantage of 'order' in Stata could be more obvious. We don't need to know the exact place of the variable and just type its name:
order age, after(gender)
Is there a base function in R to deal with this issue or could I get a package? Thanks in advance.
tweetinfo <- data.frame(uid=1:50, mid=2:51, annotations=3:52, bmiddle_pic=4:53, created_at=5:54, favorited=6:55, geo=7:56, in_reply_to_screen_name=8:57, in_reply_to_status_id=9:58, in_reply_to_user_id=10:59, original_pic=11:60, reTweetId=12:61, reUserId=13:62, source=14:63, thumbnail_pic=15:64, truncated=16:65)
noretweetinfo <- data.frame(uid=21:50, mid=22:51, annotations=23:52, bmiddle_pic=24:53, created_at=25:54, favorited=26:55, geo=27:56, in_reply_to_screen_name=28:57, in_reply_to_status_id=29:58, in_reply_to_user_id=30:59, original_pic=31:60, reTweetId=32:61, reUserId=33:62, source=34:63, thumbnail_pic=35:64, truncated=36:65)
retweetinfo <- data.frame(uid=41:50, mid=42:51, annotations=43:52, bmiddle_pic=44:53, created_at=45:54, deleted=46:55, favorited=47:56, geo=48:57, in_reply_to_screen_name=49:58, in_reply_to_status_id=50:59, in_reply_to_user_id=51:60, original_pic=52:61, source=53:62, thumbnail_pic=54:63, truncated=55:64)
tweetinfo$type <- "ti"
noretweetinfo$type <- "nr"
retweetinfo$type <- "rt"
gtinfo <- rbind(tweetinfo, noretweetinfo)
gtinfo$deleted=""
gtinfo <- gtinfo[,c(1:16,18,17)]
retweetinfo <- transform(retweetinfo, reTweetId="", reUserId="")
retweetinfo <- retweetinfo[,c(1:5,7:12,17:18,13:15,6,16)]
gtinfo <- rbind(gtinfo, retweetinfo)
write.table(gtinfo, file="C:/gtinfo.txt", row.names=F, col.names=T, sep="\t", quote=F)
# rm(list=ls(all=T))
Upvotes: 5
Views: 2280
Reputation: 6230
The package dplyr
and the function dplyr::relocate
, a new verb introduced in dplyr 1.0.0
, does exactly what you are looking for.
library(dplyr)
data %>% relocate(v17, v18, .before = v13)
data %>% relocate(v6, v16, .after = last_col())
data %>% relocate(age, .after = gender)
Upvotes: 1
Reputation: 193687
Because I'm procrastinating and experimenting with different things, here's a function that I whipped up. Ultimately, it depends on append
:
moveme <- function(invec, movecommand) {
movecommand <- lapply(strsplit(strsplit(movecommand, ";")[[1]], ",|\\s+"),
function(x) x[x != ""])
movelist <- lapply(movecommand, function(x) {
Where <- x[which(x %in% c("before", "after", "first", "last")):length(x)]
ToMove <- setdiff(x, Where)
list(ToMove, Where)
})
myVec <- invec
for (i in seq_along(movelist)) {
temp <- setdiff(myVec, movelist[[i]][[1]])
A <- movelist[[i]][[2]][1]
if (A %in% c("before", "after")) {
ba <- movelist[[i]][[2]][2]
if (A == "before") {
after <- match(ba, temp)-1
} else if (A == "after") {
after <- match(ba, temp)
}
} else if (A == "first") {
after <- 0
} else if (A == "last") {
after <- length(myVec)
}
myVec <- append(temp, values = movelist[[i]][[1]], after = after)
}
myVec
}
Here's some sample data representing the names of your dataset:
x <- paste0("v", 1:18)
Imagine now that we wanted "v17" and "v18" before "v3", "v6" and "v16" at the end, and "v5" at the beginning:
moveme(x, "v17, v18 before v3; v6, v16 last; v5 first")
# [1] "v5" "v1" "v2" "v17" "v18" "v3" "v4" "v7" "v8" "v9" "v10" "v11" "v12"
# [14] "v13" "v14" "v15" "v6" "v16"
So, the obvious usage would be, for a data.frame
named "df":
df[moveme(names(df), "how you want to move the columns")]
And, for a data.table
named "DT" (which, as @mnel points out, would be more memory efficient):
setcolorder(DT, moveme(names(DT), "how you want to move the columns"))
Note that compound moves are specified by semicolons.
The recognized moves are:
before
(move the specified columns to before another named column)after
(move the specified columns to after another named column)first
(move the specified columns to the first position)last
(move the specified columns to the last position)Upvotes: 3
Reputation: 115485
You could write your own function that does this.
The following will give you the new order for your column names using similar syntax to stata
where
is a named list with 4 possibilities
list(last = T)
list(first = T)
list(before = x)
where x
is the variable name in questionlist(after = x)
where x
is the variable name in questionsorted = T
will sort var_list
lexicographically (a combination of alphabetic
and sequential
from the stata
command
The function works on the names only, (once you pass a data.frame
object as data
, and returns a reordered list of names
eg
stata.order <- function(var_list, where, sorted = F, data) {
all_names = names(data)
# are all the variable names in
check <- var_list %in% all_names
if (any(!check)) {
stop("Not all variables in var_list exist within data")
}
if (names(where) == "before") {
if (!(where %in% all_names)) {
stop("before variable not in the data set")
}
}
if (names(where) == "after") {
if (!(where %in% all_names)) {
stop("after variable not in the data set")
}
}
if (sorted) {
var_list <- sort(var_list)
}
where_in <- which(all_names %in% var_list)
full_list <- seq_along(data)
others <- full_list[-c(where_in)]
.nwhere <- names(where)
if (!(.nwhere %in% c("last", "first", "before", "after"))) {
stop("where must be a list of a named element first, last, before or after")
}
do_what <- switch(names(where), last = length(others), first = 0, before = which(all_names[others] ==
where) - 1, after = which(all_names[others] == where))
new_order <- append(others, where_in, do_what)
return(all_names[new_order])
}
tmp <- as.data.frame(matrix(1:100, ncol = 10))
stata.order(var_list = c("V2", "V5"), where = list(last = T), data = tmp)
## [1] "V1" "V3" "V4" "V6" "V7" "V8" "V9" "V10" "V2" "V5"
stata.order(var_list = c("V2", "V5"), where = list(first = T), data = tmp)
## [1] "V2" "V5" "V1" "V3" "V4" "V6" "V7" "V8" "V9" "V10"
stata.order(var_list = c("V2", "V5"), where = list(before = "V6"), data = tmp)
## [1] "V1" "V3" "V4" "V2" "V5" "V6" "V7" "V8" "V9" "V10"
stata.order(var_list = c("V2", "V5"), where = list(after = "V4"), data = tmp)
## [1] "V1" "V3" "V4" "V2" "V5" "V6" "V7" "V8" "V9" "V10"
# throws an error
stata.order(var_list = c("V2", "V5"), where = list(before = "v11"), data = tmp)
## Error: before variable not in the data set
if you want to do the reordering memory-efficiently (by reference, without copying) use data.table
DT <- data.table(tmp)
# sets by reference, no copying
setcolorder(DT, stata.order(var_list = c("V2", "V5"), where = list(after = "V4"),
data = DT))
DT
## V1 V3 V4 V2 V5 V6 V7 V8 V9 V10
## 1: 1 21 31 11 41 51 61 71 81 91
## 2: 2 22 32 12 42 52 62 72 82 92
## 3: 3 23 33 13 43 53 63 73 83 93
## 4: 4 24 34 14 44 54 64 74 84 94
## 5: 5 25 35 15 45 55 65 75 85 95
## 6: 6 26 36 16 46 56 66 76 86 96
## 7: 7 27 37 17 47 57 67 77 87 97
## 8: 8 28 38 18 48 58 68 78 88 98
## 9: 9 29 39 19 49 59 69 79 89 99
## 10: 10 30 40 20 50 60 70 80 90 100
Upvotes: 2
Reputation: 132969
This should give you the same file:
#snip
gtinfo <- rbind(tweetinfo, noretweetinfo)
gtinfo$deleted=""
retweetinfo <- transform(retweetinfo, reTweetId="", reUserId="")
gtinfo <- rbind(gtinfo, retweetinfo)
gtinfo <-gtinfo[,c(1:16,18,17)]
#snip
It is possible to implement a function like Strata's order function in R, but I don't think there is much demand for that.
Upvotes: 0
Reputation: 2895
I get your problem. I now have code to offer:
move <- function(data,variable,before) {
m <- data[variable]
r <- data[names(data)!=variable]
i <- match(before,names(data))
pre <- r[1:i-1]
post <- r[i:length(names(r))]
cbind(pre,m,post)
}
# Example.
library(MASS)
data(painters)
str(painters)
# Move 'Expression' variable before 'Drawing' variable.
new <- move(painters,"Expression","Drawing")
View(new)
Upvotes: 2
Reputation: 10606
It is very unclear what you would like to do, but your first sentence makes me assume you would like to sort dataset.
Actually, there is a built-in order
function, which returns the indices of the ordered sequence. Are you searching this?
> x <- c(3,2,1)
> order(x)
[1] 3 2 1
> x[order(x)]
[1] 1 2 3
Upvotes: 0