Reputation: 85
I have a [1,758 x 38] data frame where each row is a job posting while the columns are skills required for each posting (skill1 to skill38). Most job postings have a number of identical skills, except that they are listed in different columns. I would like to produce summary stats for the skills required (e.g, most common skill required). I can produce this for a single column using data.table
:
data[, .N, keyby = skills1]
But I am unable to implement a looping mechanism to go through each column. How do I do this?
Upvotes: 2
Views: 679
Reputation: 8377
I use apply inside a wrapper function called sumstats that produces the main statistical indicators:
CV = function(x, ...) {sd(x, ...)/mean(x, ...)}
sumstats=function(x, ...) {
mean.k=function(x) {if (is.numeric(x)) round(mean(x, ...), digits = 2)
else "N*N"}
median.k=function(x) { if (is.numeric(x)) round(median(x, ...), digits = 2)
else "N*N"}
sd.k=function(x) { if (is.numeric(x)) round(sd(x, ...), digits = 2)
else "N*N"}
cv.k=function(x) { if (is.numeric(x)) round(CV(x, ...), digits = 2)
else "N*N"}
min.k=function(x) { if (is.numeric(x)) round(min(x, ...), digits = 2)
else "N*N"}
max.k=function(x) { if (is.numeric(x)) round(max(x, ...), digits = 2)
else "N*N"}
sumtable <- cbind(as.matrix(colSums(!is.na(x))), sapply(x,mean.k), sapply(x,median.k), sapply(x,sd.k), sapply(x,cv.k), sapply(x,min.k), sapply(x,max.k))
sumtable <- as.data.frame(sumtable); names(sumtable) <- c("N.obs","Moy","Med","sd","CV", "min","max")
return(sumtable)
}
The [...] allows you to add a na.rm=T argument
> head(sumstats(mtcars), 3)
### N.obs Moy Med sd CV min max
### mpg 32 20.09 19.20 6.03 0.30 10.40 33.90
### cyl 32 6.19 6.00 1.79 0.29 4.00 8.00
### disp 32 230.72 196.30 123.94 0.54 71.10 472.00
Note: it doesn't work if you have only one column!
Upvotes: 0
Reputation: 886928
You could do this in base R
by using lapply
to loop over the columns. The output will be a 'list'.
lapply(data, table)
Or @thelatemail mentioned, the 'wide' format can be converted to 'long' with 2 columns and then do the table
library(reshape2)
table(melt(as.matrix(data))[-1])
A similar method using data.table
would be
library(data.table)
setDT(melt(as.matrix(data))[-1])[, .N, .(Var2, value)]
Or using mtabulate
library(qdapTools)
mtabulate(data)
Upvotes: 3