anrpet
anrpet

Reputation: 139

R: ordering a dataset with consistent elements in ouput

I have this dataset:

X1         X2
123        ABC
123        ABC
123        ABD 
123        ABD
123        LNE
123        LNS

I am using this code to produce a list of most frequent pair per id number:

 newdf = setDT(df)[, .N, .(X1, X2)][order(factor(X2, levels = 
          unique(X1)), -N),.(X1 =toString(unique(X2)[1:2])) , X1]

which produces a list of id numbers and their corresponding top pairs:

X1     X2
123    ABC,ABD

My problem is that the 3 letter codes in the pairs often switch positions, making the elements inconsistent with each other. For Example:

X1     X2
123    ABC, ABD
124    ABD, ABC 

How can I ensure that the pairs are consistent, like this:

X1    X2
123   ABC, ABD
124   ABC, ABD

Upvotes: 0

Views: 34

Answers (1)

A Gore
A Gore

Reputation: 1900

This might work:

mydf <- data.table(structure(list(X1 = c(123L, 123L, 123L, 123L, 123L, 123L, 124L, 124L, 124L), X2 = structure(c(1L,
1L, 2L, 2L, 3L, 4L, 1L, 2L, 2L), .Label = c("ABC", "ABD", "LNE", "LNS"), class = "factor")), .Names = c("X1",
 "X2"), class = "data.frame", row.names = c(NA, -9L)))
mydf[,X2:=factor(X2)]
mydf[,.N,by=.(X1,X2)][,.(X2=toString(sort(X2[order(N,decreasing = TRUE)][1:2]))),by=.(X1)]
# X1       X2
# 1: 123 ABC, ABD
# 2: 124 ABC, ABD

This works because if you call sort on a factor variable then it will sort it based on levels.

Upvotes: 1

Related Questions