Reputation: 311
I have a DF
with just one column: DF$A
. It is a factor with names which I need to re-order in a particular way:
l
pheno
l.ldl.a
m.ldl.b
s.ldl.c
x.vldl.b
l.vldl.c
m.vldl.d
s.vldl.f
xs.vldl.h
xxl.vldl.a
xl.hdl.a
l.hdl.b
m.hdl.c
s.hdl.d
I try to create column 2
with reordered DF$A
according to two
reorderLevels <- c(XXL.VLDL,XL.VLDL,L.VLDL,M.VLDL,S.VLDL,XS.VLDL,
IDL,L.LDL,M.LDL,S.LDL,XL.HDL,L.HDL,M.HDL,S.HDL)
without caring about the last part of the names.
I know how to reorder according to the first part of name (before first dot) or according to second part of name (part between dots), but I don't know how to arrange according to two parts of names.
Up to now I can reorder it using the following command but according only to one part of the name
l1 <- l %>% mutate(m2 = match(sapply(strsplit(l$pheno, "[.]"),
function(x) x[1]), reorderLevels)) %>%
arrange(m2) %>%
select(-m2)
Upvotes: 1
Views: 127
Reputation: 386
I'm not 100% sure what you're asking. I think you want to reorder the factor DF$A
according to the following scheme:
top priority: arrange by middle section, with the following order:
middle.ordering = c('vldl', 'idl', 'ldl', 'hdl')
second priority: arrange first section, with the following order (I added an 'x' even though it's not in your reoderLines
, because you have an 'x' in DF$A):
first.ordering = c('xxl', 'xl', 'l', 'm', 's', 'x', 'xs')
you don't care about the order of the last section, but I only know how to solve this easily if we specify one, so I'm picking an arbitrary order of the letters I can see in the last section:
last.ordering = c('a', 'b', 'c', 'd', 'f', 'h')
As for your final output, I don't know what you want. I can think of 4 possible things you might want:
DF$A
in the exact order you wrote it in, but with new levels in the order you want. This would be useful if you make plots of these data, since the plots would be arranged according to the factor levels. It also means that if you have other columns in the data frame, you can keep the pairings across all of the rows the same. That would look like this:
[1] l.ldl.a m.ldl.b s.ldl.c x.vldl.b l.vldl.c m.vldl.d s.vldl.f xs.vldl.h xxl.vldl.a xl.hdl.a l.hdl.b m.hdl.c
[13] s.hdl.d
Levels: xxl.vldl.a l.vldl.c m.vldl.d s.vldl.f x.vldl.b xs.vldl.h l.ldl.a m.ldl.b s.ldl.c xl.hdl.a l.hdl.b m.hdl.c s.hdl.d
DF$A
in a new order, but with the same alphabetically ordered levels as before (e.g. level 1 would correspond to l.hdl.b because that's the first element of DF$A
alphabetically). That would look like this:
[1] xxl.vldl.a l.vldl.c m.vldl.d s.vldl.f x.vldl.b xs.vldl.h l.ldl.a m.ldl.b s.ldl.c xl.hdl.a l.hdl.b m.hdl.c
[13] s.hdl.d
Levels: l.hdl.b l.ldl.a l.vldl.c m.hdl.c m.ldl.b m.vldl.d s.hdl.d s.ldl.c s.vldl.f xl.hdl.a xs.vldl.h x.vldl.b xxl.vldl.a
DF$A
in a new order, with new levels. That would look like this:
[1] xxl.vldl.a l.vldl.c m.vldl.d s.vldl.f x.vldl.b xs.vldl.h l.ldl.a m.ldl.b s.ldl.c xl.hdl.a l.hdl.b m.hdl.c
[13] s.hdl.d
Levels: xxl.vldl.a l.vldl.c m.vldl.d s.vldl.f x.vldl.b xs.vldl.h l.ldl.a m.ldl.b s.ldl.c xl.hdl.a l.hdl.b m.hdl.c s.hdl.d
You might also want to have more possible levels for the factor that are actually realized in DF$A, e.g. if you're going to add more data later. If that's the case, then your output would look like this, with all of the possible orderings of the three sections accounted for:
[1] l.ldl.a m.ldl.b s.ldl.c x.vldl.b l.vldl.c m.vldl.d s.vldl.f xs.vldl.h xxl.vldl.a xl.hdl.a l.hdl.b m.hdl.c
[13] s.hdl.d
168 Levels: xxl.vldl.a xxl.vldl.b xxl.vldl.c xxl.vldl.d xxl.vldl.f xxl.vldl.h xl.vldl.a xl.vldl.b xl.vldl.c xl.vldl.d xl.vldl.f ... xs.hdl.h
If one of those things is what you want, then here's a way to do each of those things:
DF = data.frame(A=factor(c(
'l.ldl.a',
'm.ldl.b',
's.ldl.c',
'x.vldl.b',
'l.vldl.c',
'm.vldl.d',
's.vldl.f',
'xs.vldl.h',
'xxl.vldl.a',
'xl.hdl.a',
'l.hdl.b',
'm.hdl.c',
's.hdl.d')))
first.ordering = c('xxl', 'xl', 'l', 'm', 's', 'x', 'xs')
middle.ordering = c('vldl', 'idl', 'ldl', 'hdl')
last.ordering = c('a', 'b', 'c', 'd', 'f', 'h')
# make a big cartesion product of the orderings,
# making sure that the top-priority orderings are mentioned *last*
# in expand.gird
complete.ordering = with(
expand.grid(last.ordering, first.ordering, middle.ordering),
paste(Var2, Var3, Var1, sep='.'))
new.levels = complete.ordering[complete.ordering %in% DF$A]
A.with.new.levels.but.same.order = factor(DF$A, levels=new.levels)
A.with.new.order.but.same.levels = DF$A[order(as.numeric(A.with.new.levels.but.same.order))]
A.with.new.order.and.levels = factor(A.with.new.order.but.same.levels, levels=new.levels)
A.with.same.order.and.more.levels = factor(DF$A, levels=complete.ordering)
Also, if your original data frame had had more columns, for example if it looked like this:
A another.column
1 l.ldl.a 1
2 m.ldl.b 2
3 s.ldl.c 3
4 x.vldl.b 4
5 l.vldl.c 5
6 m.vldl.d 6
7 s.vldl.f 7
8 xs.vldl.h 8
9 xxl.vldl.a 9
10 xl.hdl.a 10
11 l.hdl.b 11
12 m.hdl.c 12
13 s.hdl.d 13
And you wanted to rearrange the order of all the rows together, preserving the associations among the elements of each row, then you could do the following:
A.with.new.levels.but.same.order = factor(DF$A, levels=new.levels)
DF.with.new.order = DF[order(as.numeric(A.with.new.levels.but.same.order)),]
This would give you the following data frame:
A another.column
9 xxl.vldl.a 9
5 l.vldl.c 5
6 m.vldl.d 6
7 s.vldl.f 7
4 x.vldl.b 4
8 xs.vldl.h 8
1 l.ldl.a 1
2 m.ldl.b 2
3 s.ldl.c 3
10 xl.hdl.a 10
11 l.hdl.b 11
12 m.hdl.c 12
13 s.hdl.d 13
Upvotes: 4
Reputation: 1694
I'd like to propose tidyr
and dplyr
for this as an alternative.
DF %>%
separate("A", c("first", "middle", "last"), sep="[.]") %>%
arrange(middle, first) %>%
unite(A, c(first, middle,last), sep=".") %>%
mutate(A=as.factor(A))
First we separate the three parts, arrange them and the unite. Lastly we redo the levels in this new order.
This gives
A
1 l.hdl.b
2 m.hdl.c
3 s.hdl.d
4 xl.hdl.a
5 l.ldl.a
6 m.ldl.b
7 s.ldl.c
8 l.vldl.c
9 m.vldl.d
10 s.vldl.f
11 x.vldl.b
12 xs.vldl.h
13 xxl.vldl.a
Slighty longer than the levels
answer, but perhaps more readable.
Upvotes: 2
Reputation: 43334
If you want to reorder by, say, the second part an then the first (they're already ordered first then second), pass order
the parts of the label you care about in order of importance. You can use sub
to pull out the pieces:
levels(DF$A) <- levels(DF$A)[order(sub('.*\\.(.*)\\..*', '\\1', levels(DF$A)),
sub('\\..*', '', levels(DF$A)))]
levels(DF$A)
# [1] "l.hdl.b" "m.hdl.c" "s.hdl.d" "xl.hdl.a" "l.ldl.a" "m.ldl.b" "s.ldl.c"
# [8] "l.vldl.c" "m.vldl.d" "s.vldl.f" "x.vldl.b" "xs.vldl.h" "xxl.vldl.a"
Note the hdl
s are first, with the ordering within sorted by the first part.
Upvotes: 1