Flocke Haus
Flocke Haus

Reputation: 45

How to aggregate and plot data of a data frame

My raw data

head(predictionDB)
  Helpful X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22
1       1  1  1  1  0  1  1  0  0  0   0   1   1   0   0   0   1   1   0   0   1   0   0
2       0  1  0  0  0  0  1  0  0  0   1   0   1   1   0   0   0   0   0   1   0   1   0

I have aggregated them using the following code:

plotDB <- aggregate(predictionDB, 
                    list(predictionDB$Helpful), 
                    mean)

This is the output data

> plotDB
  Group.1 Helpful        X1        X2        X3        X4        X5        X6        X7        X8        X9       X10       X11      X12       X13       X14       X15       X16       X17       X18       X19       X20       X21       X22
1       0       0 0.1666192 0.1857021 0.2418114 0.2258616 0.1774423 0.1874110 0.2603247 0.1777271 0.1407007 0.1540872 0.1794361 0.174879 0.1859869 0.3691256 0.2574765 0.1569353 0.2455141 0.1726004 0.1572202 0.2016520 0.2267160 0.1911136
2       1       1 0.2896282 0.3180039 0.2896282 0.3072407 0.2666341 0.3228963 0.2793542 0.2818004 0.2504892 0.2607632 0.2588063 0.316047 0.3317025 0.2896282 0.3003914 0.2656556 0.3047945 0.2999022 0.3126223 0.3131115 0.2813112 0.3131115

Now I want to create a plot which includes the variables on the x axis in order to compare the means of all variables X for Helpful = 0 and Helpful = 1.

Using the following code gives me the plot that I need for both helpful classes with each variable, but there are not labels on the x-axis at all.

Problems :

barplot(t(as.matrix(plotDB[,3:nTopicsLDA])), 
        beside=TRUE)

where nTopicsLDA is a numeric variable, in this case 22.

enter image description here

Thank you very much in advance!

Upvotes: 1

Views: 75

Answers (1)

Parfait
Parfait

Reputation: 107687

Since barplot uses the underlying matrix's column headers for x-axis labels, your plot renders no x-axis labels since t(as.matrix(...)) returns an empty colnames:

colnames(t(as.matrix(plotDB[,3:nTopicsLDA])))
# NULL

Consider reshaping your wide data frame into long format (usually the preferred structure for most data analytics operations including plotting) and build plot matrix with tapply:

# RESHAPE WIDE TO LONG
predictionDB_long <- reshape(predictionDB, idvar = "Helpful",
                             varying=names(predictionDB)[-1], v.names="Value",
                             times = names(predictionDB)[-1], timevar = "X",
                             new.row.names = 1:1E5, direction="long")

# TAPPLY MEAN CALL ON TWO GROUPINGS FOR 2-D MATRIX
plot_mat <- with(predictionDB_long, tapply(Value, list(X, Helpful), mean))

# RE-ORDER COLUMNS
plot_mat <- plot_mat[paste0("X", 1:nTopicsLDA),]

Doing so, the colnames of plot_mat becomes the x-axis labels. However, only one x-axis labels render by default:

# BAR PLOT WITH ONE AXIS
barplot(plot_mat, ylim=c(0, 0.6), beside=TRUE, cex.names=0.75,
        main = "Mean Helpful Bar Plot")

Bar Plot with One Axis

For two axes, you need a customized solution such as calling axis() adjusting horizontal and padding parameters accordingly by font sizes. See how row.names had to be integrated. Do note the accommodation of middle space between the binary sets of Helpful bars:

# BAR PLOT WITH TWO AXES
barplot(plot_mat, ylim=c(0, 0.6), beside=TRUE, cex.names=0.75,
        main = "Mean Helpful Bar Plot")

axis(1, at=1:(nTopicsLDA*2 + 2), hadj=-0.5, padj=-2,
     labels=c(row.names(plot_mat), "", row.names(plot_mat), ""), cex.axis=0.5)

Bar Plot with Two Axes


Data

set.seed(9132019)
predictionDB <- data.frame(Helpful = sample(c(0, 1), 500, replace=TRUE),
                           replicate(22, sample(c(0, 1), 500, replace=TRUE))
)
nTopicsLDA <- ncol(predictionDB) - 1

Upvotes: 1

Related Questions