Reputation: 45
My raw data
head(predictionDB)
Helpful X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22
1 1 1 1 1 0 1 1 0 0 0 0 1 1 0 0 0 1 1 0 0 1 0 0
2 0 1 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 1 0 1 0
I have aggregated them using the following code:
plotDB <- aggregate(predictionDB,
list(predictionDB$Helpful),
mean)
This is the output data
> plotDB
Group.1 Helpful X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22
1 0 0 0.1666192 0.1857021 0.2418114 0.2258616 0.1774423 0.1874110 0.2603247 0.1777271 0.1407007 0.1540872 0.1794361 0.174879 0.1859869 0.3691256 0.2574765 0.1569353 0.2455141 0.1726004 0.1572202 0.2016520 0.2267160 0.1911136
2 1 1 0.2896282 0.3180039 0.2896282 0.3072407 0.2666341 0.3228963 0.2793542 0.2818004 0.2504892 0.2607632 0.2588063 0.316047 0.3317025 0.2896282 0.3003914 0.2656556 0.3047945 0.2999022 0.3126223 0.3131115 0.2813112 0.3131115
Now I want to create a plot which includes the variables on the x axis in order to compare the means of all variables X for Helpful = 0 and Helpful = 1.
Using the following code gives me the plot that I need for both helpful classes with each variable, but there are not labels on the x-axis at all.
Problems :
barplot(t(as.matrix(plotDB[,3:nTopicsLDA])),
beside=TRUE)
where nTopicsLDA is a numeric variable, in this case 22.
Thank you very much in advance!
Upvotes: 1
Views: 75
Reputation: 107687
Since barplot
uses the underlying matrix's column headers for x-axis labels, your plot renders no x-axis labels since t(as.matrix(...))
returns an empty colnames
:
colnames(t(as.matrix(plotDB[,3:nTopicsLDA])))
# NULL
Consider reshaping your wide data frame into long format (usually the preferred structure for most data analytics operations including plotting) and build plot matrix with tapply
:
# RESHAPE WIDE TO LONG
predictionDB_long <- reshape(predictionDB, idvar = "Helpful",
varying=names(predictionDB)[-1], v.names="Value",
times = names(predictionDB)[-1], timevar = "X",
new.row.names = 1:1E5, direction="long")
# TAPPLY MEAN CALL ON TWO GROUPINGS FOR 2-D MATRIX
plot_mat <- with(predictionDB_long, tapply(Value, list(X, Helpful), mean))
# RE-ORDER COLUMNS
plot_mat <- plot_mat[paste0("X", 1:nTopicsLDA),]
Doing so, the colnames
of plot_mat becomes the x-axis labels. However, only one x-axis labels render by default:
# BAR PLOT WITH ONE AXIS
barplot(plot_mat, ylim=c(0, 0.6), beside=TRUE, cex.names=0.75,
main = "Mean Helpful Bar Plot")
For two axes, you need a customized solution such as calling axis()
adjusting horizontal and padding parameters accordingly by font sizes. See how row.names
had to be integrated. Do note the accommodation of middle space between the binary sets of Helpful bars:
# BAR PLOT WITH TWO AXES
barplot(plot_mat, ylim=c(0, 0.6), beside=TRUE, cex.names=0.75,
main = "Mean Helpful Bar Plot")
axis(1, at=1:(nTopicsLDA*2 + 2), hadj=-0.5, padj=-2,
labels=c(row.names(plot_mat), "", row.names(plot_mat), ""), cex.axis=0.5)
Data
set.seed(9132019)
predictionDB <- data.frame(Helpful = sample(c(0, 1), 500, replace=TRUE),
replicate(22, sample(c(0, 1), 500, replace=TRUE))
)
nTopicsLDA <- ncol(predictionDB) - 1
Upvotes: 1