oski86
oski86

Reputation: 865

How to merge vectors and count factor occurences per column

I'm trying to merge multiple lists of characters (A,B,C,D,E) into a dataframe or matrix. All of them have the same number of elements - 20. They look like this:

> line1
 [1] B C C D A B D E C A B E B A D E C C A B
Levels: A B C D E
> typeof(line1)
[1] "integer"
> line2
 [1] B E E A C E D B B D C C A A E E A A E B
Levels: A B C D E
> typeof(line2)
[1] "integer"
> (...)
> line10
 [1] B E E A C E D B B C D C A A E E C A E B
Levels: A B C D E

And the purpose of this is to count occurrences per column in summary in all objects (line 1..n). Let's say n = 10. So the output should be like this (based on example above):

    A B C D E
1:  0 3 0 0 0
2:  0 0 1 0 2
3:  0 0 1 0 2
(...)
20: 0 3 0 0 0

How can I start? Thanks!

Upvotes: 1

Views: 125

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193537

You're essentially asking for table:

table(
  cbind(
    id = 1:20,                                   ## index of position of vector element
    stack(
      lapply(mget(ls(pattern = "line\\d+")),     ## collect all "line" vecs in a list
             as.character)))[c("id", "values")]) ## stack doesn't work with factors
#     values
# id   A B C D E
#   1  0 3 0 0 0
#   2  0 0 1 0 2
#   3  0 0 1 0 2
#   4  2 0 0 1 0
#   5  1 0 2 0 0
#   6  0 1 0 0 2
#   7  0 0 0 3 0
#   8  0 2 0 0 1
#   9  0 2 1 0 0
#   10 1 0 1 1 0
#   11 0 1 1 1 0
#   12 0 0 2 0 1
#   13 2 1 0 0 0
#   14 3 0 0 0 0
#   15 0 0 0 1 2
#   16 0 0 0 0 3
#   17 1 0 2 0 0
#   18 2 0 1 0 0
#   19 1 0 0 0 2
#   20 0 3 0 0 0

What the above does:

  • mget: Collects all objects named list1, list2, and so on into a single list.
  • lapply(., as.character): Converts the factors to characters since stack doesn't like factors.
  • stack: Creates a two column data.frame version of the list, where the values are stored in a column called "values" and the relevant list name is called "ind". That second column is not needed.
  • cbind(id = 1:20, .): Adds an "id" column that represents the position (from 1 to 20) of the value in the vector. The values 1 to 20 are recycled.
  • table(.[c("id", "values")]): Tabulates just the values in the "id" and "values" column.

You could also do something like:

t(Reduce("+", lapply(mget(ls(pattern = "line\\d+")), function(x) sapply(x, table))))

Upvotes: 3

Related Questions