Chris
Chris

Reputation: 2081

Transpose dataframe without t(), and using spread(), cast(), or melt()

I need to transpose a dataframe without using t() as I want to avoid transforming it into a matrix. So, I'm using the solution from here:

mydata <- data.table(col0=c("row1","row2","row3"),
                     col1=c(11,21,31),
                     col2=c(12,22,32),
                     col3=c(13,23,33))

mydata
# col0 col1 col2 col3
# row1   11   12   13
# row2   21   22   23
# row3   31   32   33

dcast(melt(mydata, id.vars = "col0"), variable ~ col0)
#    variable row1 row2 row3
# 1:     col1   11   21   31
# 2:     col2   12   22   32
# 3:     col3   13   23   33

And I use the same logic with the data I'm using:

x <- merge(as.data.frame(table(mtcars$mpg)), as.data.frame(round(prop.table(table(mtcars$mpg)),2)), by="Var1", all.x=TRUE)
data.table::dcast(data.table::melt(x, id.vars = "Var1"), variable ~ Var1)

It works! But it gives me a warning and a "future error":

Warning message in data.table::melt(x, id.vars = "Var1"): “The melt
generic in data.table has been passed a data.frame and will attempt to
redirect to the relevant reshape2 method; please note that reshape2 is
deprecated, and this redirection is now deprecated as well. To
continue using melt methods from reshape2 while both libraries are
attached, e.g. melt.list, you can prepend the namespace like
reshape2::melt(x). In the next version, this warning will become an
error.” Warning message in data.table::dcast(data.table::melt(x,
id.vars = "Var1"), variable ~ : “The dcast generic in data.table has
been passed a data.frame and will attempt to redirect to the
reshape2::dcast; please note that reshape2 is deprecated, and this
redirection is now deprecated as well. Please do this redirection
yourself like reshape2::dcast(data.table::melt(x, id.vars = "Var1")).
In the next version, this warning will become an error.”

Also, I have been trying to transpose the dataframe using a solution from here using dplyr::spread() but it seems to be far more complicated than the solution from data.table package (when the value columns are more than 1, as in this case). I'm more used to dplyr() and tidyverse() but data.table solution is far simpler to just ignore it.

Additional Information.

> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS/LAPACK: /usr/lib/libopenblasp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] GGally_1.4.0       forcats_0.4.0      stringr_1.4.0      dplyr_0.8.3       
 [5] purrr_0.3.2        readr_1.3.1        tidyr_1.0.0        tibble_2.1.3      
 [9] ggplot2_3.2.1.9000 tidyverse_1.2.1    bigrquery_1.2.0    httr_1.4.1        

loaded via a namespace (and not attached):
 [1] bit64_0.9-7          jsonlite_1.6         splines_3.6.0       
 [4] modelr_0.1.4         Formula_1.2-3        assertthat_0.2.1    
 [7] getPass_0.2-2        latticeExtra_0.6-28  cellranger_1.1.0    
[10] pillar_1.4.2         backports_1.1.5      lattice_0.20-38     
[13] glue_1.3.1           uuid_0.1-2           digest_0.6.21       
[16] checkmate_1.9.4      RColorBrewer_1.1-2   rvest_0.3.4         
[19] colorspace_1.4-1     htmltools_0.4.0      Matrix_1.2-17       
[22] plyr_1.8.4           psych_1.8.12         pkgconfig_2.0.3     
[25] broom_0.5.2          haven_2.1.1          scales_1.0.0        
[28] htmlTable_1.13.2     generics_0.0.2       withr_2.1.2         
[31] repr_1.0.1.9000      skimr_1.0.7          nnet_7.3-12         
[34] cli_1.1.0            mnormt_1.5-5         survival_2.44-1.1   
[37] magrittr_1.5         crayon_1.3.4         readxl_1.3.1        
[40] evaluate_0.14        fs_1.3.1             nlme_3.1-141        
[43] xml2_1.2.2           foreign_0.8-72       data.table_1.12.4   
[46] tools_3.6.0          hms_0.5.1            gargle_0.4.0        
[49] lifecycle_0.1.0      munsell_0.5.0        cluster_2.1.0       
[52] compiler_3.6.0       rlang_0.4.0          grid_3.6.0          
[55] pbdZMQ_0.3-3         IRkernel_1.0.2.9000  rstudioapi_0.10     
[58] htmlwidgets_1.5.1    base64enc_0.1-3      gtable_0.3.0        
[61] DBI_1.0.0            reshape_0.8.8        reshape2_1.4.3      
[64] R6_2.4.0             gridExtra_2.3        lubridate_1.7.4     
[67] knitr_1.25           bit_1.1-14           zeallot_0.1.0       
[70] Hmisc_4.2-0          stringi_1.4.3        parallel_3.6.0      
[73] IRdisplay_0.7.0.9000 Rcpp_1.0.2           vctrs_0.2.0         
[76] rpart_4.1-15         acepack_1.4.1        xfun_0.10           
[79] tidyselect_0.2.5    

Upvotes: 3

Views: 3630

Answers (3)

M--
M--

Reputation: 29203

You need to make sure that you are passing a data.table object to data.table::melt and data.table::dcast.

x<-merge(as.data.frame(table(mtcars$mpg)),
        as.data.frame(round(prop.table(table(mtcars$mpg)),2)), 
        by="Var1", all.x=TRUE)

data.table::dcast(data.table::melt(data.table::setDT(x), id.vars = "Var1"), 
                  variable ~ Var1)

Warning(s):

You see that by using data.table::setDT, that "future error" is resolved.

#> Warning in melt.data.table(data.table::setDT(x), id.vars = "Var1"):
#> 'measure.vars' [Freq.x, Freq.y] are not all of the same type. By order
#> of hierarchy, the molten data value column will be of type 'double'. All
#> measure variables not of type 'double' will be coerced too. Check DETAILS
#> in ?melt.data.table for more on coercion.

Output:

#>    variable 10.4 13.3 14.3 14.7   15 15.2 15.5 15.8 16.4 17.3 17.8 18.1
#> 1:   Freq.x 2.00 1.00 1.00 1.00 1.00 2.00 1.00 1.00 1.00 1.00 1.00 1.00
#> 2:   Freq.y 0.06 0.03 0.03 0.03 0.03 0.06 0.03 0.03 0.03 0.03 0.03 0.03
#>    18.7 19.2 19.7   21 21.4 21.5 22.8 24.4   26 27.3 30.4 32.4 33.9
#> 1: 1.00 2.00 1.00 2.00 2.00 1.00 2.00 1.00 1.00 1.00 2.00 1.00 1.00
#> 2: 0.03 0.06 0.03 0.06 0.06 0.03 0.06 0.03 0.03 0.03 0.06 0.03 0.03

P.S. I could not get the error reproduced in data.table_1.12.2 and had to update to data.table_1.12.6.

Upvotes: 3

IceCreamToucan
IceCreamToucan

Reputation: 28705

I need to transpose a dataframe without using t() as I want to avoid transforming it into a matrix.

If your only requirement is to avoid coercing your data frame into a matrix, you can use data.table::transpose which requires version >= 1.12.4

data.table::transpose(
  mydata, 
  keep.names = 'variable', 
  make.names = names(mydata)[1])

#    variable row1 row2 row3
# 1:     col1   11   21   31
# 2:     col2   12   22   32
# 3:     col3   13   23   33

Upvotes: 5

DHW
DHW

Reputation: 1267

The new tidyr 1.0 functions make this a lot easier:

library(tidyverse)
library(magrittr)
#> 
#> Attaching package: 'magrittr'
#> The following object is masked from 'package:purrr':
#> 
#>     set_names
#> The following object is masked from 'package:tidyr':
#> 
#>     extract

mydata <- tibble(col0=c("row1","row2","row3"),
                 col1=c(11,21,31),
                 col2=c(12,22,32),
                 col3=c(13,23,33))

# First collect all the values in the one column
(new_data <- mydata %>% pivot_longer(col1:col3))
#> # A tibble: 9 x 3
#>   col0  name  value
#>   <chr> <chr> <dbl>
#> 1 row1  col1     11
#> 2 row1  col2     12
#> 3 row1  col3     13
#> 4 row2  col1     21
#> 5 row2  col2     22
#> 6 row2  col3     23
#> 7 row3  col1     31
#> 8 row3  col2     32
#> 9 row3  col3     33

# Col0 is what we want the new column names to come from, so:
(new_data %<>% pivot_wider(names_from = col0))
#> # A tibble: 3 x 4
#>   name   row1  row2  row3
#>   <chr> <dbl> <dbl> <dbl>
#> 1 col1     11    21    31
#> 2 col2     12    22    32
#> 3 col3     13    23    33

So with your mtcars use case:

library(tidyverse)

(x <- 
    mtcars %>% 
    group_by(mpg) %>% 
    summarize(Freq.x = n(), 
              Freq.y = Freq.x/nrow(.)) %>% 
    pivot_longer(-mpg) %>% 
    pivot_wider(names_from = mpg))
#> # A tibble: 2 x 26
#>   name  `10.4` `13.3` `14.3` `14.7`   `15` `15.2` `15.5` `15.8` `16.4`
#>   <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#> 1 Freq~ 2      1      1      1      1      2      1      1      1     
#> 2 Freq~ 0.0625 0.0312 0.0312 0.0312 0.0312 0.0625 0.0312 0.0312 0.0312
#> # ... with 16 more variables: `17.3` <dbl>, `17.8` <dbl>, `18.1` <dbl>,
#> #   `18.7` <dbl>, `19.2` <dbl>, `19.7` <dbl>, `21` <dbl>, `21.4` <dbl>,
#> #   `21.5` <dbl>, `22.8` <dbl>, `24.4` <dbl>, `26` <dbl>, `27.3` <dbl>,
#> #   `30.4` <dbl>, `32.4` <dbl>, `33.9` <dbl>

I know basically nothing about data.table, but this gives me no red.

Now, if your values aren't all the same type, this will still give you issues - because it's still at some point stacking all the values into one column - so I was going to suggest a nest() approach. But then I realized... if you want to transpose the thing, and the rows aren't all the same value types, then you're ultimately trying to get values of different types into one column, aren't you? So some homogenizing conversion will be unavoidable.

Created on 2019-10-22 by the reprex package (v0.3.0)

Upvotes: 2

Related Questions