ricks.k
ricks.k

Reputation: 101

R counting strings variables in each row of a dataframe

I have a dataframe that looks something like this, where each row represents a samples, and has repeats of the the same strings

> df
  V1 V2 V3 V4 V5
1  a  a  d  d  b
2  c  a  b  d  a
3  d  b  a  a  b
4  d  d  a  b  c
5  c  a  d  c  c

I want to be able to create a new dataframe, where ideally the headers would be the string variables in the previous dataframe (a, b, c, d) and the contents of each row would be the number of occurrences of each the respective variable from the original dataframe. Using the example from above, this would look like

> df2
   a  b  c  d 
1  2  1  0  2  
2  2  1  1  1  
3  2  1  0  1
4  1  1  1  2  
5  1  0  3  1  

In my actual dataset, there are hundreds of variables, and thousands of samples, so it'd be ideal if I could automatically pull out the names from the original dataframe, and alphabetize them into the headers for the new dataframe.

Upvotes: 3

Views: 604

Answers (2)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

You can stack the columns and then use table:

table(cbind(id = 1:nrow(mydf), 
            stack(lapply(mydf, as.character)))[c("id", "values")])
#    values
# id  a b c d
#   1 2 1 0 2
#   2 2 1 1 1
#   3 2 2 0 1
#   4 1 1 1 2
#   5 1 0 3 1

Upvotes: 1

akrun
akrun

Reputation: 887118

You may try

library(qdapTools)
mtabulate(as.data.frame(t(df)))

Or

mtabulate(split(as.matrix(df), row(df)))

Or using base R

Un1 <- sort(unique(unlist(df)))
t(apply(df ,1, function(x) table(factor(x, levels=Un1))))

Upvotes: 3

Related Questions