Reputation: 137
My starting point is having several character vectors containing POS tags I extracted from texts. For example:
c("NNS", "VBP", "JJ", "CC", "DT")
c("NNS", "PRP", "JJ", "RB", "VB")
I use table()
or ftable()
to count the occurences of each tag.
CC DT JJ NNS VBP
1 1 1 1 1
The ultimate goal is to have a data.frame looking like this:
NNS VBP PRP JJ CC RB DT VB
1 1 1 0 1 1 0 1 0
2 1 0 1 1 0 1 0 1
Using plyr::rbind.fill
seems reasonable to me here, but it needs data.frame objects as inputs. However, when using as.data.frame.matrix(table(POS_vector))
an error occurs.
Error in seq_len(ncols) :
argument must be coercible to non-negative integer
Using as.data.frame.matrix(ftable(POS_vector))
actually produces a data.frame, but without the colnames.
V1 V2 V3 V4 V5 ...
1 1 1 1 1
Any help is highly appreciated.
Upvotes: 1
Views: 989
Reputation: 193517
In base R, you can try:
table(rev(stack(setNames(dat, seq_along(dat)))))
You can also use mtabulate
from "qdapTools":
library(qdapTools)
mtabulate(dat)
# CC DT JJ NNS PRP RB VB VBP
# 1 1 1 1 1 0 0 0 1
# 2 0 0 1 1 1 1 1 0
dat
is the same as defined in @Heroka's answer:
dat <- list(c("NNS", "VBP", "JJ", "CC", "DT"),
c("NNS", "PRP", "JJ", "RB", "VB"))
Upvotes: 3
Reputation: 13139
It's probably a bit of a workaround, but this might be a solution.
We assume all our vectors are in a list:
dat <- list(c("NNS", "VBP", "JJ", "CC", "DT"),
c("NNS", "PRP", "JJ", "RB", "VB"))
Then we transform our table to a transposed matrix, which we convert to a data.table:
library(data.table)
temp <- lapply(dat,function(x){
data.table(t(as.matrix(table(x))))
})
Then we use rbindlist
to create the desired output:
rbindlist(temp,fill=T)
We can also choose to put all our data in a data.table first, and then do the aggregating. Note that this assumes equal vector lengths.
temp <- as.data.table(dat)
#turn to long format
temp_m <- melt(temp, measure.vars=colnames(temp))
#count values for each variable/value-combination, then reshape to wide
res <- dcast(temp_m[,.N,by=.(variable,value)], variable~value,value.var="N", fill=0)
Upvotes: 2