Taufi
Taufi

Reputation: 1577

Sum of different values over different columns in R

Let's say I have the following form of data in a dataframe in R:

Property 1 | Property 2 | ... | Property n
    A            B                 R
    C            A                 S 
    D            F                 C
    .            .                 . 
    .            .                 . 
    .            .                 . 
    R            Z                 X 

where each of the n properties in any cell can assume any of the letters A to Z. Now, what I would like is to calculate for each row the number of times any of the 26 letters appeared in that row and give me that number in a new column next to Property n. So, for example, in the first row among the n properties there are seven times A, six times B, 0 times C, etc. and the code gives me the following table

Property 1 | Property 2 | ... | Property n | A | B | C | ... | Z 
    A            B                 R         7   6   0 | ... | 2 
    C            A                 S       
    D            F                 C
    .            .                 . 
    .            .                 . 
    .            .                 . 
    R            Z                 X 

Is there a function in R that does that? Despite of it being slow I thought that I could write some loop over each one of the letters and and row in the form of

x <- vector(length=nrow(tr))
for (i in 1:nrow(tr)) {
x[i] <- count(tr[i,], vars="A")
}

But then I get the error

Error in unique.default(x) : 
unique() can only be applied to vectors

or even worse, if "A" is not even once among the n properties I get the error

 Error in eval(expr, envir, enclos) : object 'A' not found

What is a possible solution here?

Upvotes: 1

Views: 43

Answers (1)

Mike H.
Mike H.

Reputation: 14360

You could use an lapply with rowSums to do this rather quickly. I generated some fake data using only three "Properties".

set.seed(1)
df <- data.frame(Property1 = sample(LETTERS, 6), Property2 = sample(LETTERS, 6), Property3 = sample(LETTERS, 6))

df[,LETTERS] <- lapply(LETTERS, function(x) rowSums(df==x))

A snippet of the result looks like:

df[,c(1:6)]
  Property1 Property2 Property3 A B C
1         J         G         M 0 0 0
2         T         J         O 0 0 0
3         W         A         L 1 0 0
4         E         I         E 0 0 0
5         O         T         S 0 0 0
6         C         H         Y 0 0 1

Upvotes: 2

Related Questions