How to produce summary stats across multiple columns in R?

Question

I have a [1,758 x 38] data frame where each row is a job posting while the columns are skills required for each posting (skill1 to skill38). Most job postings have a number of identical skills, except that they are listed in different columns. I would like to produce summary stats for the skills required (e.g, most common skill required). I can produce this for a single column using data.table:

data[, .N, keyby = skills1]

But I am unable to implement a looping mechanism to go through each column. How do I do this?

akrun · Accepted Answer

You could do this in base R by using lapply to loop over the columns. The output will be a 'list'.

lapply(data, table)

Or @thelatemail mentioned, the 'wide' format can be converted to 'long' with 2 columns and then do the table

library(reshape2)
table(melt(as.matrix(data))[-1])

A similar method using data.table would be

library(data.table)
setDT(melt(as.matrix(data))[-1])[, .N, .(Var2, value)]

Or using mtabulate

library(qdapTools)
mtabulate(data)

How to produce summary stats across multiple columns in R?

Answers (2)

Related Questions