Reputation: 1525
I would like to convert a matrix/array (with dimnames) into a data frame. This can be done very easily using reshape2::melt
but seems harder with tidyr
, and in fact not really possible in the case of an array. Am I missing something? (In particular since reshape2
describes itself as being retired; see https://github.com/hadley/reshape).
For example, given the following matrix
MyScores <- matrix(runif(2*3), nrow = 2, ncol = 3,
dimnames = list(Month = month.name[1:2], Class = LETTERS[1:3]))
we can turn it into a data frame as follows
reshape2::melt(MyScores, value.name = 'Score') # perfect
or, using tidyr
as follows:
as_tibble(MyScores, rownames = 'Month') %>%
gather(Class, Score, -Month)
In this case reshape2
and tidyr
seem similar (although reshape2
is shorter if you are looking for a long-format data frame).
However for arrays, it seems harder. Given
EverybodyScores <- array(runif(2*3*5), dim = c(2,3,5),
dimnames = list(Month = month.name[1:2], Class = LETTERS[1:3], StudentID = 1:5))
we can turn it into a data frame as follows:
reshape2::melt(EverybodyScores, value.name = 'Score') # perfect
but using tidyr
it's not clear how to do it:
as_tibble(EverybodyScores, rownames = 'Month') # looses month information and need to distange Class and StudentID
Is this a situation where the right solution is to stick to using reshape2
?
Upvotes: 7
Views: 2046
Reputation: 15062
One way I just found by playing around is to coerce via tbl_cube
. I have never really used the class but it seems to do the trick in this instance.
EverybodyScores <- array(
runif(2 * 3 * 5),
dim = c(2, 3, 5),
dimnames = list(Month = month.name[1:2], Class = LETTERS[1:3], StudentID = 1:5)
)
library(tidyverse)
library(cubelyr)
EverybodyScores %>%
as.tbl_cube(met_name = "Score") %>%
as_tibble
#> # A tibble: 30 x 4
#> Month Class StudentID Score
#> <chr> <chr> <int> <dbl>
#> 1 January A 1 0.366
#> 2 February A 1 0.254
#> 3 January B 1 0.441
#> 4 February B 1 0.562
#> 5 January C 1 0.313
#> 6 February C 1 0.192
#> 7 January A 2 0.799
#> 8 February A 2 0.277
#> 9 January B 2 0.631
#> 10 February B 2 0.101
#> # ... with 20 more rows
Created on 2018-08-15 by the reprex package (v0.2.0).
Upvotes: 2
Reputation: 4879
Here is the new tidyr
way to do the same:
library(tidyr)
EverybodyScores <- array(
runif(2 * 3 * 5),
dim = c(2, 3, 5),
dimnames = list(Month = month.name[1:2], Class = LETTERS[1:3], StudentID = 1:5)
)
as_tibble(EverybodyScores, rownames = "Month") %>%
pivot_longer(
cols = matches("^A|^B|^C"),
names_sep = "\\.",
names_to = c("Class", "StudentID")
)
#> # A tibble: 30 x 4
#> Month Class StudentID value
#> <chr> <chr> <chr> <dbl>
#> 1 January A 1 0.0325
#> 2 January B 1 0.959
#> 3 January C 1 0.593
#> 4 January A 2 0.0702
#> 5 January B 2 0.882
#> 6 January C 2 0.918
#> 7 January A 3 0.459
#> 8 January B 3 0.849
#> 9 January C 3 0.901
#> 10 January A 4 0.328
#> # … with 20 more rows
Created on 2021-02-23 by the reprex package (v1.0.0)
Upvotes: 2
Reputation: 16832
Making a tibble drops the row names, but instead of going straight into a tibble, you can make the array into a base R data.frame
, then use tidyr::rownames_to_column
to make a column for months. Notice that converting to a data frame creates columns with names like A.1
, sticking the class and ID together; you can separate these again with tidyr::separate
. Calling as_tibble
is optional, just for if you care about it being a tibble
in the end, and also can come at any point in the workflow once you've made a column from the row names.
library(tidyverse)
EverybodyScores <- array(runif(2*3*5), dim = c(2,3,5),
dimnames = list(Month = month.name[1:2], Class = LETTERS[1:3], StudentID = 1:5))
EverybodyScores %>%
as.data.frame() %>%
rownames_to_column("Month") %>%
gather(key = class_id, value = value, -Month) %>%
separate(class_id, into = c("Class", "StudentID"), sep = "\\.") %>%
as_tibble()
#> # A tibble: 30 x 4
#> Month Class StudentID value
#> <chr> <chr> <chr> <dbl>
#> 1 January A 1 0.576
#> 2 February A 1 0.229
#> 3 January B 1 0.930
#> 4 February B 1 0.547
#> 5 January C 1 0.761
#> 6 February C 1 0.468
#> 7 January A 2 0.631
#> 8 February A 2 0.893
#> 9 January B 2 0.638
#> 10 February B 2 0.735
#> # ... with 20 more rows
Created on 2018-08-15 by the reprex package (v0.2.0).
Upvotes: 2