Flash
Flash

Reputation: 16703

Named arrays, dataframes and matrices

If I split my data matrix into rows according to class labels in another vector y like this, the result is something with 'names' like this:

> X <- matrix(c(1,2,3,4,5,6,7,8),nrow=4,ncol=2)
> y <- c(1,3,1,3)
> X_split <- split(as.data.frame(X),y)
$`1`
  V1 V2
1  1  5
3  3  7

$`3`
  V1 V2
2  2  6
4  4  8

I want to loop through the results and do some operations on each matrix, for example sum the elements or sum the columns. How do I access each matrix in a loop so I can that?

labels = names(X_split)
for (k in labels) {
    # How do I get X_split[k] as a matrix?
    sum_class = sum(X_split[k]) # Doesn't work
}

In fact, I don't really want to deal with dataframes and named arrays at all. Is there a way I can call split without as.data.frame and get a list of matrices or something similar?

Upvotes: 1

Views: 159

Answers (3)

konvas
konvas

Reputation: 14346

To split without converting to a data frame

X_split <- list(X[c(1, 3), ], X[c(2, 4), ]) 

More generally, to write it in terms of a vector y of length nrow(X), indicating the group to which each row belongs, you can write this as

X_split <- lapply(unique(y), function(i) X[y == i, ])

To sum the results

X_sum <- lapply(X_split, sum)

# [[1]]
# [1] 16

# [[2]]
# [1] 20

(or use sapply if you want the result as a vector)

Upvotes: 3

Frank
Frank

Reputation: 66819

Pretty sure operating directly on the matrix is most efficient:

tapply(rowSums(X),y,sum)
#  1  3 
# 16 20 

Upvotes: 1

David Arenburg
David Arenburg

Reputation: 92292

Another option is not to split in the first place and just sum per y. Here's a possible data.table approach

library(data.table)
as.data.table(X)[, sum(sapply(.SD, sum)), by = y]
#    y V1
# 1: 1 16
# 2: 3 20

Upvotes: 3

Related Questions