Reputation: 601
This question pertains to reordering of columns in a large data frame, for example, having about 800 columns. The data frame has many column names preceding different dates for each id (i.e. first column). Similar questions appear online (e.g. Reordering columns in data frame once again and Reordering columns in large data frame) but their specifics do not fit into my case. A sample of the data set is
df <-
structure(
list(
id = c(1L, 2L, 3L, 4L,5L),
date1 = c("1/4/2004", "3/8/2004", "NA", "13/10/2004","11/3/2003"),
ax=c(1,2,1,"NA",5),
am=c(1,0,1,0,0),
aq=c(0,0,1,1,1),
date2 = c("8/6/2002", "11/5/2004", "3/5/2004",
"25/11/2004","21/1/2004"),
bx=c(3,2,6,1,5),
bm=c(1,1,0,1,1),
bq=c(1,0,1,0,0),
date3=c("23/6/2006", "24/12/2006", "18/2/2006", "NA","NA"),
cx=c(1,2,4,1,0),
cm=c(1,1,0,1,1),
cq=c(1,0,1,0,0)
),
.Names = c("id",
"date1","ax","am","aq","date2","bx","bm","bq","date3","cx","cm","cq"),
class = "data.frame",
row.names = c(NA,-5L)
)
I want to reorder the columns such that we have "am","aq","ax"; "bm","bq","bx" and "cm","cq","cx" following the date1; date2 and date3, respectively. For this small scenario example, I have tried
df1<-df[,c(1,2,4,5,3,6,8,9,7,10,12,13,11)]
This code works well and it produces the expected results below
df1
id date1 am aq ax date2 bm bq bx date3 cm cq cx
1 1 1/4/2004 1 0 1 8/6/2002 1 1 3 23/6/2006 1 1 1
2 2 3/8/2004 0 0 2 11/5/2004 1 0 2 24/12/2006 1 0 2
3 3 NA 1 1 1 3/5/2004 0 1 6 18/2/2006 0 1 4
4 4 13/10/2004 0 1 NA 25/11/2004 1 0 1 NA 1 0 1
5 5 11/3/2003 0 1 5 21/1/2004 1 0 5 NA 1 0 0
However, I am looking for a much handy code that would be easy on large data. Any help is greatly appreciated.
Upvotes: 2
Views: 892
Reputation: 34291
If your complete data follows the pattern you've outlined you can recycle a vector of position adjustments like so:
df[c(1, (2:ncol(df) + c(0,1,1,-2)))]
id date1 am aq ax date2 bm bq bx date3 cm cq cx
1 1 1/4/2004 1 0 1 8/6/2002 1 1 3 23/6/2006 1 1 1
2 2 3/8/2004 0 0 2 11/5/2004 1 0 2 24/12/2006 1 0 2
3 3 NA 1 1 1 3/5/2004 0 1 6 18/2/2006 0 1 4
4 4 13/10/2004 0 1 NA 25/11/2004 1 0 1 NA 1 0 1
5 5 11/3/2003 0 1 5 21/1/2004 1 0 5 NA 1 0 0
Explanation:
The pattern is to keep the date in place, move second and third columns forward one, and move the fourth back two. We can create a vector of this:
adj.pattern <- c(0,1,1,-2)
Because R recycles shorter vectors to match the length of longer ones we can apply it easily to the index of column positions from position 2 to the number of columns in the data frame 2:ncol(df)
, which gives
col.index <- 2:ncol(df) + adj.pattern
col.index
[1] 2 4 5 3 6 8 9 7 10 12 13 11
Then we use this index to order the data frame (adding 1
at the start for the ID column):
df[c(1, col.index)]
Upvotes: 2
Reputation: 388797
If you want to keep the id
and date
columns fixed and sort
the remaining columns within themselves based on name, we can do
#1:ncol(df)
all_cols <- seq_len(ncol(df))
#Get indices of fixed columns
fixed_columns <- c(1, grep("date", names(df)))
#Get the name of columns apart from fixed ones
cols <- names(df)[-fixed_columns]
#Sort and match them and update the new order in all_cols
all_cols[-fixed_columns] <- match(sort(cols), names(df))
df[all_cols]
# id date1 am aq ax date2 bm bq bx date3 cm cq cx
#1 1 1/4/2004 1 0 1 8/6/2002 1 1 3 23/6/2006 1 1 1
#2 2 3/8/2004 0 0 2 11/5/2004 1 0 2 24/12/2006 1 0 2
#3 3 NA 1 1 1 3/5/2004 0 1 6 18/2/2006 0 1 4
#4 4 13/10/2004 0 1 NA 25/11/2004 1 0 1 NA 1 0 1
#5 5 11/3/2003 0 1 5 21/1/2004 1 0 5 NA 1 0 0
Upvotes: 1