user3062260
user3062260

Reputation: 1644

R: sort df rows according to a vector of different length

I would like to sort a df so that the rows appear in the order of a vector. I tried this here but it returns the df with rows relabelled exactly as in the vector rather than the whole df simply re-ordered.

My df is like:

> head(df)
     POSITION MEANDEPTH CHROM
1     0:10000         0  chr1
2 10000:20000         0  chr1
3 20000:30000         0  chr1
4 30000:40000         0  chr1
5 40000:50000         0  chr1
6 50000:60000         0  chr1
> tail(df)
                POSITION MEANDEPTH CHROM
308834 57170000:57180000         0  chrY
308835 57180000:57190000         0  chrY
308836 57190000:57200000         0  chrY
308837 57200000:57210000         0  chrY
308838 57210000:57220000         0  chrY
308839 57220000:57230000         0  chrY

> levels(df$CHROM)
 [1] "chr1"  "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18" "chr19" "chr2"  "chr20" "chr21" "chr22" "chr3"  "chr4" 
[18] "chr5"  "chr6"  "chr7"  "chr8"  "chr9"  "chrM"  "chrX"  "chrY"

I would like to re-order the df according to df$CHROM so that the rows are in the following order:

# RE_ORDER CHROMS
chrom_order <- c('chr1','chr2','chr3','chr4','chr5','chr6','chr7','chr8','chr9','chr10','chr11',
               'chr12','chr13','chr14','chr15','chr16','chr17','chr18','chr19','chr20','chr21','chr22','chrX','chrM')

I have tried:

df <- df[match(chrom_order, df$CHROM),]

but the rows were reordred as follows:

> head(df)
       POSITION MEANDEPTH CHROM
1       0:10000         0  chr1
128716  0:10000         0  chr2
169134  0:10000         0  chr3
188964  0:10000         0  chr4
207986  0:10000         0  chr5
226140  0:10000         0  chr6

I'm trying to make the df so that chr1 all appear together, then chr2, chr3 etc as in the vector 'chrom_order'.

I also tried:

library(dplyr)
df %>%
  slice(match(CHROM, chrom_order))

But this didnt work either. I thought about subsetting loads of times with different values of df$CHROM then re-joining the dfs in the order I want but it seems a bit long winded an inefficient. I'm sure there is a quick fix?

Upvotes: 1

Views: 73

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 146110

Just set the order of the levels:

df$CHROM = factor(df$CHROM, levels = chrom_order)

Then you can order your data frame on this column (the order of the levels is part of the factor now)

df[order(df$CHROM, df$POSITION), ]

Side note: not sure if you manually typed the order you want. If so, you might want to do something like this in the future:

chrom_order = c(paste0("chr", 1:22), "chrX", "chrM")

Upvotes: 3

Related Questions