Reputation: 601

A handy means of reordering columns in large data frame using R

This question pertains to reordering of columns in a large data frame, for example, having about 800 columns. The data frame has many column names preceding different dates for each id (i.e. first column). Similar questions appear online (e.g. Reordering columns in data frame once again and Reordering columns in large data frame) but their specifics do not fit into my case. A sample of the data set is

df <-
structure(
list(
  id = c(1L, 2L, 3L, 4L,5L),
  date1 = c("1/4/2004", "3/8/2004", "NA", "13/10/2004","11/3/2003"),
  ax=c(1,2,1,"NA",5),
  am=c(1,0,1,0,0),
  aq=c(0,0,1,1,1),
  date2 = c("8/6/2002", "11/5/2004", "3/5/2004", 
"25/11/2004","21/1/2004"),
  bx=c(3,2,6,1,5),
  bm=c(1,1,0,1,1),
  bq=c(1,0,1,0,0),
  date3=c("23/6/2006", "24/12/2006", "18/2/2006", "NA","NA"),
  cx=c(1,2,4,1,0),
  cm=c(1,1,0,1,1),
  cq=c(1,0,1,0,0)
 ),
.Names = c("id", 
"date1","ax","am","aq","date2","bx","bm","bq","date3","cx","cm","cq"),
class = "data.frame",
row.names = c(NA,-5L)
)

I want to reorder the columns such that we have "am","aq","ax"; "bm","bq","bx" and "cm","cq","cx" following the date1; date2 and date3, respectively. For this small scenario example, I have tried

df1<-df[,c(1,2,4,5,3,6,8,9,7,10,12,13,11)]

This code works well and it produces the expected results below

df1
  id      date1 am aq ax      date2 bm bq bx      date3 cm cq cx
1  1   1/4/2004  1  0  1   8/6/2002  1  1  3  23/6/2006  1  1  1
2  2   3/8/2004  0  0  2  11/5/2004  1  0  2 24/12/2006  1  0  2
3  3         NA  1  1  1   3/5/2004  0  1  6  18/2/2006  0  1  4
4  4 13/10/2004  0  1 NA 25/11/2004  1  0  1         NA  1  0  1
5  5  11/3/2003  0  1  5  21/1/2004  1  0  5         NA  1  0  0

However, I am looking for a much handy code that would be easy on large data. Any help is greatly appreciated.

Upvotes: 2

Answers (2)

lroha

Reputation: 34291

If your complete data follows the pattern you've outlined you can recycle a vector of position adjustments like so:

df[c(1, (2:ncol(df) + c(0,1,1,-2)))]

  id      date1 am aq ax      date2 bm bq bx      date3 cm cq cx
1  1   1/4/2004  1  0  1   8/6/2002  1  1  3  23/6/2006  1  1  1
2  2   3/8/2004  0  0  2  11/5/2004  1  0  2 24/12/2006  1  0  2
3  3         NA  1  1  1   3/5/2004  0  1  6  18/2/2006  0  1  4
4  4 13/10/2004  0  1 NA 25/11/2004  1  0  1         NA  1  0  1
5  5  11/3/2003  0  1  5  21/1/2004  1  0  5         NA  1  0  0

Explanation:

The pattern is to keep the date in place, move second and third columns forward one, and move the fourth back two. We can create a vector of this:

adj.pattern <- c(0,1,1,-2)

Because R recycles shorter vectors to match the length of longer ones we can apply it easily to the index of column positions from position 2 to the number of columns in the data frame 2:ncol(df), which gives

col.index <- 2:ncol(df) + adj.pattern
col.index

[1]  2  4  5  3  6  8  9  7 10 12 13 11

Then we use this index to order the data frame (adding 1 at the start for the ID column):

df[c(1, col.index)]

Upvotes: 2

Ronak Shah

Reputation: 388797

If you want to keep the id and date columns fixed and sort the remaining columns within themselves based on name, we can do

#1:ncol(df)
all_cols <- seq_len(ncol(df))
#Get indices of fixed columns
fixed_columns <- c(1, grep("date", names(df)))
#Get the name of columns apart from fixed ones
cols <- names(df)[-fixed_columns]
#Sort and match them and update the new order in all_cols
all_cols[-fixed_columns] <- match(sort(cols), names(df))
df[all_cols]

#  id      date1 am aq ax      date2 bm bq bx      date3 cm cq cx
#1  1   1/4/2004  1  0  1   8/6/2002  1  1  3  23/6/2006  1  1  1
#2  2   3/8/2004  0  0  2  11/5/2004  1  0  2 24/12/2006  1  0  2
#3  3         NA  1  1  1   3/5/2004  0  1  6  18/2/2006  0  1  4
#4  4 13/10/2004  0  1 NA 25/11/2004  1  0  1         NA  1  0  1
#5  5  11/3/2003  0  1  5  21/1/2004  1  0  5         NA  1  0  0

Upvotes: 1

A handy means of reordering columns in large data frame using R

Answers (2)

Related Questions