HoHoHo
HoHoHo

Reputation: 311

Reorder levels of factor names in data frame

I have a DF with just one column: DF$A. It is a factor with names which I need to re-order in a particular way:

l
pheno
l.ldl.a
m.ldl.b
s.ldl.c
x.vldl.b
l.vldl.c
m.vldl.d
s.vldl.f
xs.vldl.h
xxl.vldl.a
xl.hdl.a
l.hdl.b
m.hdl.c
s.hdl.d

I try to create column 2 with reordered DF$A according to two

reorderLevels <- c(XXL.VLDL,XL.VLDL,L.VLDL,M.VLDL,S.VLDL,XS.VLDL, 
                   IDL,L.LDL,M.LDL,S.LDL,XL.HDL,L.HDL,M.HDL,S.HDL)

without caring about the last part of the names.

I know how to reorder according to the first part of name (before first dot) or according to second part of name (part between dots), but I don't know how to arrange according to two parts of names.

Up to now I can reorder it using the following command but according only to one part of the name

l1 <- l %>% mutate(m2 = match(sapply(strsplit(l$pheno, "[.]"),
                          function(x) x[1]), reorderLevels)) %>%
            arrange(m2) %>%
            select(-m2)

Upvotes: 1

Views: 127

Answers (3)

Erin
Erin

Reputation: 386

I'm not 100% sure what you're asking. I think you want to reorder the factor DF$A according to the following scheme:

  • top priority: arrange by middle section, with the following order:

    middle.ordering = c('vldl', 'idl', 'ldl', 'hdl')
    
  • second priority: arrange first section, with the following order (I added an 'x' even though it's not in your reoderLines, because you have an 'x' in DF$A):

    first.ordering = c('xxl', 'xl', 'l', 'm', 's', 'x', 'xs')
    
  • you don't care about the order of the last section, but I only know how to solve this easily if we specify one, so I'm picking an arbitrary order of the letters I can see in the last section:

    last.ordering = c('a', 'b', 'c', 'd', 'f', 'h')
    

As for your final output, I don't know what you want. I can think of 4 possible things you might want:

  • DF$A in the exact order you wrote it in, but with new levels in the order you want. This would be useful if you make plots of these data, since the plots would be arranged according to the factor levels. It also means that if you have other columns in the data frame, you can keep the pairings across all of the rows the same. That would look like this:

     [1] l.ldl.a    m.ldl.b    s.ldl.c    x.vldl.b   l.vldl.c   m.vldl.d   s.vldl.f   xs.vldl.h  xxl.vldl.a xl.hdl.a   l.hdl.b    m.hdl.c   
    [13] s.hdl.d   
    Levels: xxl.vldl.a l.vldl.c m.vldl.d s.vldl.f x.vldl.b xs.vldl.h l.ldl.a m.ldl.b s.ldl.c xl.hdl.a l.hdl.b m.hdl.c s.hdl.d
    
  • DF$A in a new order, but with the same alphabetically ordered levels as before (e.g. level 1 would correspond to l.hdl.b because that's the first element of DF$A alphabetically). That would look like this:

     [1] xxl.vldl.a l.vldl.c   m.vldl.d   s.vldl.f   x.vldl.b   xs.vldl.h  l.ldl.a    m.ldl.b    s.ldl.c    xl.hdl.a   l.hdl.b    m.hdl.c   
    [13] s.hdl.d   
    Levels: l.hdl.b l.ldl.a l.vldl.c m.hdl.c m.ldl.b m.vldl.d s.hdl.d s.ldl.c s.vldl.f xl.hdl.a xs.vldl.h x.vldl.b xxl.vldl.a
    
  • DF$A in a new order, with new levels. That would look like this:

     [1] xxl.vldl.a l.vldl.c   m.vldl.d   s.vldl.f   x.vldl.b   xs.vldl.h  l.ldl.a    m.ldl.b    s.ldl.c    xl.hdl.a   l.hdl.b    m.hdl.c   
    [13] s.hdl.d   
    Levels: xxl.vldl.a l.vldl.c m.vldl.d s.vldl.f x.vldl.b xs.vldl.h l.ldl.a m.ldl.b s.ldl.c xl.hdl.a l.hdl.b m.hdl.c s.hdl.d
    
  • You might also want to have more possible levels for the factor that are actually realized in DF$A, e.g. if you're going to add more data later. If that's the case, then your output would look like this, with all of the possible orderings of the three sections accounted for:

     [1] l.ldl.a    m.ldl.b    s.ldl.c    x.vldl.b   l.vldl.c   m.vldl.d   s.vldl.f   xs.vldl.h  xxl.vldl.a xl.hdl.a   l.hdl.b    m.hdl.c   
    [13] s.hdl.d   
    168 Levels: xxl.vldl.a xxl.vldl.b xxl.vldl.c xxl.vldl.d xxl.vldl.f xxl.vldl.h xl.vldl.a xl.vldl.b xl.vldl.c xl.vldl.d xl.vldl.f ... xs.hdl.h
    

If one of those things is what you want, then here's a way to do each of those things:

DF = data.frame(A=factor(c(
  'l.ldl.a',
  'm.ldl.b',
  's.ldl.c',
  'x.vldl.b',
  'l.vldl.c',
  'm.vldl.d',
  's.vldl.f',
  'xs.vldl.h',
  'xxl.vldl.a',
  'xl.hdl.a',
  'l.hdl.b',
  'm.hdl.c',
  's.hdl.d')))

first.ordering = c('xxl', 'xl', 'l', 'm', 's', 'x', 'xs')
middle.ordering = c('vldl', 'idl', 'ldl', 'hdl')
last.ordering = c('a', 'b', 'c', 'd', 'f', 'h')

# make a big cartesion product of the orderings,
# making sure that the top-priority orderings are mentioned *last*
# in expand.gird
complete.ordering = with(
  expand.grid(last.ordering, first.ordering, middle.ordering),
  paste(Var2, Var3, Var1, sep='.'))
new.levels = complete.ordering[complete.ordering %in% DF$A]

A.with.new.levels.but.same.order = factor(DF$A, levels=new.levels)
A.with.new.order.but.same.levels = DF$A[order(as.numeric(A.with.new.levels.but.same.order))]
A.with.new.order.and.levels = factor(A.with.new.order.but.same.levels, levels=new.levels)
A.with.same.order.and.more.levels = factor(DF$A, levels=complete.ordering)

Also, if your original data frame had had more columns, for example if it looked like this:

            A another.column
1     l.ldl.a              1
2     m.ldl.b              2
3     s.ldl.c              3
4    x.vldl.b              4
5    l.vldl.c              5
6    m.vldl.d              6
7    s.vldl.f              7
8   xs.vldl.h              8
9  xxl.vldl.a              9
10   xl.hdl.a             10
11    l.hdl.b             11
12    m.hdl.c             12
13    s.hdl.d             13

And you wanted to rearrange the order of all the rows together, preserving the associations among the elements of each row, then you could do the following:

A.with.new.levels.but.same.order = factor(DF$A, levels=new.levels)
DF.with.new.order = DF[order(as.numeric(A.with.new.levels.but.same.order)),]

This would give you the following data frame:

            A another.column
9  xxl.vldl.a              9
5    l.vldl.c              5
6    m.vldl.d              6
7    s.vldl.f              7
4    x.vldl.b              4
8   xs.vldl.h              8
1     l.ldl.a              1
2     m.ldl.b              2
3     s.ldl.c              3
10   xl.hdl.a             10
11    l.hdl.b             11
12    m.hdl.c             12
13    s.hdl.d             13

Upvotes: 4

bytesinflight
bytesinflight

Reputation: 1694

I'd like to propose tidyr and dplyr for this as an alternative.

DF %>%
 separate("A", c("first", "middle", "last"), sep="[.]") %>%
 arrange(middle, first) %>%
 unite(A, c(first, middle,last), sep=".") %>%
 mutate(A=as.factor(A))

First we separate the three parts, arrange them and the unite. Lastly we redo the levels in this new order.

This gives

            A
1     l.hdl.b
2     m.hdl.c
3     s.hdl.d
4    xl.hdl.a
5     l.ldl.a
6     m.ldl.b
7     s.ldl.c
8    l.vldl.c
9    m.vldl.d
10   s.vldl.f
11   x.vldl.b
12  xs.vldl.h
13 xxl.vldl.a

Slighty longer than the levels answer, but perhaps more readable.

Upvotes: 2

alistaire
alistaire

Reputation: 43334

If you want to reorder by, say, the second part an then the first (they're already ordered first then second), pass order the parts of the label you care about in order of importance. You can use sub to pull out the pieces:

levels(DF$A) <- levels(DF$A)[order(sub('.*\\.(.*)\\..*', '\\1', levels(DF$A)), 
                                   sub('\\..*', '', levels(DF$A)))]

levels(DF$A)
# [1] "l.hdl.b"    "m.hdl.c"    "s.hdl.d"    "xl.hdl.a"   "l.ldl.a"    "m.ldl.b"    "s.ldl.c"   
# [8] "l.vldl.c"   "m.vldl.d"   "s.vldl.f"   "x.vldl.b"   "xs.vldl.h"  "xxl.vldl.a"

Note the hdls are first, with the ordering within sorted by the first part.

Upvotes: 1

Related Questions