nalzok
nalzok

Reputation: 16107

Converting data.frame to high-dimensional matrix

For example, consider the following data

> sample.df
  f1 f2   x1   x2   x3
1  2  2 7.28 9.40 5.02
2  1  1 6.30 9.56 3.74
3  2  1 6.88 8.72 3.14
4  1  2 6.68 9.58 3.84

I wonder how to write MAGIC so that

> sample.matrix <- MAGIC(sample.df)
> sample.matrix[1, 1, ]
[1] 6.30 9.56 3.74
> sample.matrix[1, 2, ]
[1] 6.68 9.58 3.84

Basically, sample.matrix[x, y, ] selects the row in the data frame with sample.df[sample.df$f1 == x & sample.df$f2 == y, ], and then remove the redundant columns indicating the value of f1 and f2. Note that each combination of (f1, f2) appears and appears only once in the data frame.

My first thought was as.matrix followed by a dim<-, but the rows in the data frame are not sorted. Sorting it would take O(n * log(n)), but I just want to create a table, so theoretically the time complexity could be bound by O(n).

It would be better if you could exploit vectorization, if possible.

Upvotes: 4

Views: 117

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388962

EDIT

After re-reading the question again I think we can use split without ordering to avoid the sorting step. Since f1 and f2 are unique for every row, we can do

split(sample.df[, -(1:2)], list(sample.df$f1, sample.df$f2))


#$`1.1`
#   x1   x2   x3
#2 6.3 9.56 3.74

#$`2.1`
#    x1   x2   x3
#3 6.88 8.72 3.14

#$`1.2`
#    x1   x2   x3
#4 6.68 9.58 3.84

#$`2.2`
#    x1  x2   x3
#1 7.28 9.4 5.02

Original Answer

I am not exactly clear about the goal but one way is to order sample.df by f1, f2 and then subset using Map

new_df <- sample.df[with(sample.df, order(f1, f2)),]

Map(function(x, y) new_df[with(new_df, f1 == x & f2 == y), -(1:2)],
                   new_df$f1, new_df$f2)

#[[1]]
#   x1   x2   x3
#2 6.3 9.56 3.74

#[[2]]
#    x1   x2   x3
#4 6.68 9.58 3.84

#[[3]]
#    x1   x2   x3
#3 6.88 8.72 3.14

#[[4]]
#    x1  x2   x3
#1 7.28 9.4 5.02

If the above one is your expected output then every row in new_df is the output you want. If you want them as separate list we can also split every row

split(new_df[, -(1:2)], seq_len(nrow(new_df)))

which would give you the same output.

Upvotes: 1

Sotos
Sotos

Reputation: 51582

Here is an idea via matrix. Note this is not exactly the same as the output you require, but can easily be transformed.

Assuming df is your sample.df,

m1 <- matrix(do.call(paste, df[with(df, order(f1, f2)),-c(1, 2)]), nrow = 2, byrow = TRUE)
m1[1, 2]
#[1] "6.68 9.58 3.84"
m1[1, 1]
#[1] "6.3 9.56 3.74"
m1[2, 1]
#[1] "6.88 8.72 3.14"
m1[2, 2]
#[1] "7.28 9.4 5.02"

You can get them as numeric vectors by splitting, i.e.

as.numeric(strsplit(m1[1, 2], ' ')[[1]])
#[1] 6.68 9.58 3.84

Upvotes: 3

Related Questions