Reputation: 43
I have a data frame in R that records the ranked preferences of customers for a number of different brands. A sample of the data frame looks something like the table below. The actual table is much larger in both dimensions (80,000 x 30 approximately).
The table I have:
+------+---------+---------+---------+---------+
| User | Brand_A | Brand_B | Brand_C | Brand_D |
+------+---------+---------+---------+---------+
| A | 1 | NA | 3 | 2 |
| B | NA | NA | NA | 1 |
| C | 3 | 2 | 4 | 1 |
| D | NA | 1 | 2 | NA |
+------+---------+---------+---------+---------+
where 1 indicates the customer ranks the brand as the 'best' and NA indicates that the customer has not ranked the brand. I would like to create a table that selects, for each user, the Top 3 (or Top N) ranked brands and outputs a table that might look like this:
+------+---------+---------+---------+
| User | Ranked1 | Ranked2 | Ranked3 |
+------+---------+---------+---------+
| A | Brand_A | Brand_D | Brand_C |
| B | Brand_D | NA | NA |
| C | Brand_D | Brand_B | Brand_A |
| D | Brand_B | Brand_C | NA |
+------+---------+---------+---------+
Assume that each customers ranking is exhaustive i.e. if I have only used one brand, that brand is automatically ranked 1.
I have tried using for loops to get the required output, but with no success. I think there is something rather simple that I am missing.
Upvotes: 4
Views: 781
Reputation: 8848
with tidyverse
...
df <- read.table(header = T, text = '
User Brand_A Brand_B Brand_C Brand_D
A 1 NA 3 2
B NA NA NA 1
C 3 2 4 1
D NA 1 2 NA
')
library(tidyverse)
df %>%
gather(brand, rank, -User, na.rm = T) %>%
filter(rank < 4) %>%
spread(rank, brand, sep = '')
generates...
User rank1 rank2 rank3
1 A Brand_A Brand_D Brand_C
2 B Brand_D <NA> <NA>
3 C Brand_D Brand_B Brand_A
4 D Brand_B Brand_C <NA>
Upvotes: 1
Reputation: 14360
One option would be to melt your data then recast it. This option using data.table
would look like:
library(data.table)
dcast(setDT(melt(data, id.vars = "user"))[, rank := paste0("Ranked",value)][!is.na(value),], user ~ rank, value.var = "variable")
# user Ranked1 Ranked2 Ranked3 Ranked4
#1 A Brand_A Brand_D Brand_C <NA>
#2 B Brand_D <NA> <NA> <NA>
#3 C Brand_D Brand_B Brand_A Brand_C
#4 D Brand_B Brand_C <NA> <NA>
Upvotes: 2
Reputation: 3875
You could do it using apply
:
df2=data.frame(User=df$User,t(apply(df,1,function(x) names(x)[-1][order(x[-1],na.last=NA)][1:3])))
colnames(df2)=c("User",paste0("Ranked",c(1:3)))
This returns:
User Ranked1 Ranked2 Ranked3
1 A Brand_A Brand_D Brand_C
2 B Brand_D <NA> <NA>
3 C Brand_D Brand_B Brand_A
4 D Brand_B Brand_C <NA>
Upvotes: 1