Plotting ranked data

Question

I have ranked data from a survey in which participants have ranked multiple items (n=50) in the order from 1 to 5.

The data looks similar to the following example:

Rank1 <- c("Item 1", "Item 3", "Item 6")
Rank2 <- c("Item 6", "Item 9", "Item 10")
Rank3 <- c("Item 45", "Item 6", "Item 10")
Rank4 <- c("Item 12", "Item 32", "Item 34")
Rank5 <- c("Item 22", "Item 5", "Item 21")

df <- data.frame(Rank1, Rank2, Rank3, Rank4, Rank5)

In order to get a better overview on the data I transposed the data frame and transformed the string values (e.g. "item 1") into numerical values by assigning each string value to a unique number. Based on this, the data looks as follows (just for simplification it is the same example as before, but with numerical entries and participant's responses P1-P5)

P1 <- c(1, 3, 6, 3, 40)
P2 <- c(6, 9, 10, 11, 30)
P3 <- c(1, 3, 10, 11, 30)
P4 <- c(1, 3, 10, 2, 5)
P5 <- c(22, 5, 21, 11, 30)

df <- data.frame(P1, P2, P3, P4, P5)

Having now the numerical entries per participant and assigned to the ranks 1-5, I tried to count the entries for the items for each ranking (in the rows) with:

df$item_1 <- rowSums(df == 1)
df$item_2 <- rowSums(df == 2)
df$item_3 <- rowSums(df == 3)
df$item_4 <- rowSums(df == 5)
df$item_5 <- rowSums(df == 6)
df$item_6 <- rowSums(df == 9)
df$item_7 <- rowSums(df == 10)
df$item_8 <- rowSums(df == 40)
df$item_9 <- rowSums(df == 11)
df$item_10 <- rowSums(df == 30)
df$item_11 <- rowSums(df == 22)
df$item_12 <- rowSums(df == 21)

Next, I extracted only the counts for each items into a subset dataframe and added a variable counting the overall entries (> 0) per row for a reference:

f_counts <- subset(df, select = 6:17)
df_counts$counts <- Reduce(`+`, lapply(df_counts, `>`, 0))

I have used this reference in order to compute the relative frequencies for each item in each ranking (1-5) by dividing the counts per entry per ranking by the overall entries for each ranking/row with:

df_counts<-setDT(df_counts)[,.SD/counts]

Finally I tried to plot the frequencies by first reshaping the dataframe with the melt function (reshape2) and using the geom_bar function within ggplot2. The intention here was to show the relative frequency of each item within each ranking (1-5).

df_final_1<-reshape2::melt(df_final)
df_final_1$rowid<-1:5

ggplot(df_final_1, aes(x=variable, y=value)) + 
  geom_bar(aes(y = value,
               x= rowid,
               fill=factor(variable)),
           data=df_final_1,
           stat="identity",
           width = 0.8) +
  labs(x="Ranking", y="Share")

Ignoring the labelling at this stage, the plotting works, but gets very messy as I have roughly 50 items in the real dataset and not 12 as shown here in the example. I have experimented a little bit with the ggplot2 functions, but still dont get an illustration that clearly pictures the distribution of items along the ranks.

Sureley the procedure is quite amateur, so I would be glad to receive some recommendations on how to improve it and illustrate the share of the different items for each ranking option.

Thanks to @Jon Spring, I have found an easier way to plot the frequencies for each ranking option. However, I still face some challenge in visualizing if I try to implement it for n=50 items (variable 0-1 seems to take most of the share, but indeed 2-4 are there, just not visible). Thankful for any recommendations!.

Jon Spring · Accepted Answer

I think the calculation will be much simpler if you convert the data to longer form.

library(tidyverse)         # uses dplyr, tidyr::pivot_longer, and ggplot2
df %>%
  mutate(Ranking = row_number()) %>%       # make row position explicit 
  pivot_longer(-Ranking) %>%               # convert to longer form
  mutate(variable = as.factor(value)) %>%  # make variable a factor
  count(Ranking, variable) %>%             # count combos of rank & variable
  ggplot(aes(Ranking, n, fill = variable)) +  # Plot!
  geom_col(position = "fill")              # Normalize column height to 1

Plotting ranked data

Answers (1)

Related Questions