Reputation: 101

R counting strings variables in each row of a dataframe

I have a dataframe that looks something like this, where each row represents a samples, and has repeats of the the same strings

> df
  V1 V2 V3 V4 V5
1  a  a  d  d  b
2  c  a  b  d  a
3  d  b  a  a  b
4  d  d  a  b  c
5  c  a  d  c  c

I want to be able to create a new dataframe, where ideally the headers would be the string variables in the previous dataframe (a, b, c, d) and the contents of each row would be the number of occurrences of each the respective variable from the original dataframe. Using the example from above, this would look like

> df2
   a  b  c  d 
1  2  1  0  2  
2  2  1  1  1  
3  2  1  0  1
4  1  1  1  2  
5  1  0  3  1

In my actual dataset, there are hundreds of variables, and thousands of samples, so it'd be ideal if I could automatically pull out the names from the original dataframe, and alphabetize them into the headers for the new dataframe.

Upvotes: 3

Answers (2)

A5C1D2H2I1M1N2O1R2T1

Reputation: 193687

You can stack the columns and then use table:

table(cbind(id = 1:nrow(mydf), 
            stack(lapply(mydf, as.character)))[c("id", "values")])
#    values
# id  a b c d
#   1 2 1 0 2
#   2 2 1 1 1
#   3 2 2 0 1
#   4 1 1 1 2
#   5 1 0 3 1

Upvotes: 1

akrun

Reputation: 887951

You may try

library(qdapTools)
mtabulate(as.data.frame(t(df)))

mtabulate(split(as.matrix(df), row(df)))

Or using base R

Un1 <- sort(unique(unlist(df)))
t(apply(df ,1, function(x) table(factor(x, levels=Un1))))

Upvotes: 3

R counting strings variables in each row of a dataframe

Answers (2)

Related Questions