Leprechault
Leprechault

Reputation: 1823

Rearrange dataframe using dcast using a dummy

I've like to use dcast function (reshape2 package) for returns a reshape data frame, but doesn't work. In my example:

#Data set
X<-c(804519.4,804519.6,804519.6,804519.4,804519.4,804519.4,804519.6,804519.6,804519.4,804519.4)
Y<-c(7673833,7673833,7673833,7673833,7673833,7673833,7673833,7673833,7673833,7673833)
band<-c("band1","band1","band1","band1","band1","band2","band2","band2","band2","band2")# My original data set are 31 bands
reflec<-c(9.608848,10.504454,8.648237,9.935091,11.282750,9.608848,10.504454,8.648237,9.935091,11.282750)
dummy<-1:10
RES3<-data.frame(X,Y,band,reflec,dummy)
RES3
          X       Y  band    reflec dummy
1  804519.4 7673833 band1  9.608848     1
2  804519.6 7673833 band1 10.504454     2
3  804519.6 7673833 band1  8.648237     3
4  804519.4 7673833 band1  9.935091     4
5  804519.4 7673833 band1 11.282750     5
6  804519.4 7673833 band2  9.608848     6
7  804519.6 7673833 band2 10.504454     7
8  804519.6 7673833 band2  8.648237     8
9  804519.4 7673833 band2  9.935091     9
10 804519.4 7673833 band2 11.282750    10
RES3<-as.data.frame(RES3)
colnames(RES3)<-c("X","Y","band","reflec","dummy")
dcast(RES3, X + Y + dummy ~ band,
         fun.aggregate = length,
         value.var="reflec")

Doesn't work, my output is:

          X       Y dummy band1 band2
1  804519.4 7673833     1     1     0
2  804519.4 7673833     4     1     0
3  804519.4 7673833     5     1     0
4  804519.4 7673833     6     0     1
5  804519.4 7673833     9     0     1
6  804519.4 7673833    10     0     1
7  804519.6 7673833     2     1     0
8  804519.6 7673833     3     1     0
9  804519.6 7673833     7     0     1
10 804519.6 7673833     8     0     1

I expected:

         X       Y    band1    band2      
1 804519.4 7673833    9.608848 9.608848    
2 804519.6 7673833   10.504454 10.504454   
3 804519.6 7673833    8.648237  8.648237   
4 804519.4 7673833    9.935091  9.935091   
5 804519.4 7673833   11.282750 11.282750   

Any member could help me, because my original data set are 31 bands as levels and I've like to convert in columns? Thanks!

Upvotes: 2

Views: 308

Answers (3)

camille
camille

Reputation: 16832

Like I mentioned, reshape2 has been deprecated in favor of tidyr within the tidyverse packages. In my opinion (and that of the packages' authors), tidyr's spread and gather are a bit more clear than reshape2's cast and melt—no formula notation, cleaner ways to specify values. (Some context for that is here).

Also as I mentioned, you've got a couple extra lines—data.frame will add the column names according to the names of the vectors going into it.

I've updated this answer to match the new data you posted. My original solution worked for your original data, but needs some more steps for your new data, which I'm doing with dplyr functions.

I don't totally get the dummy column at this point, since it's not in your expected output. I'm dropping it with dplyr::select(-dummy). One tricky thing with tidyr::spread is that you need some way to mark rows uniquely—annoying, but also prevents errors in reshaping the data. So I'm grouping by band, then adding row numbers as follows:

library(tidyr)
library(dplyr)

res3 <- data.frame(X, Y, band, reflec, dummy)

res3 %>%
  select(-dummy) %>%
  group_by(band) %>%
  mutate(row = row_number())
#> # A tibble: 10 x 5
#> # Groups:   band [2]
#>          X       Y band  reflec   row
#>      <dbl>   <dbl> <fct>  <dbl> <int>
#>  1 804519. 7673833 band1   9.61     1
#>  2 804520. 7673833 band1  10.5      2
#>  3 804520. 7673833 band1   8.65     3
#>  4 804519. 7673833 band1   9.94     4
#>  5 804519. 7673833 band1  11.3      5
#>  6 804519. 7673833 band2   9.61     1
#>  7 804520. 7673833 band2  10.5      2
#>  8 804520. 7673833 band2   8.65     3
#>  9 804519. 7673833 band2   9.94     4
#> 10 804519. 7673833 band2  11.3      5

That way, row 1 will have a band 1 value and a band 2 value, and so on. Then I call spread with the band as the key to become columns and the reflec values to populate those columns, and finally drop the row number column.

res3 %>%
  select(-dummy) %>%
  group_by(band) %>%
  mutate(row = row_number()) %>%
  spread(key = band, value = reflec) %>%
  select(-row)
#> # A tibble: 5 x 4
#>         X       Y band1 band2
#>     <dbl>   <dbl> <dbl> <dbl>
#> 1 804519. 7673833  9.61  9.61
#> 2 804519. 7673833  9.94  9.94
#> 3 804519. 7673833 11.3  11.3 
#> 4 804520. 7673833 10.5  10.5 
#> 5 804520. 7673833  8.65  8.65

Created on 2019-01-28 by the reprex package (v0.2.1)

Upvotes: 3

Gonzalo Falloux Costa
Gonzalo Falloux Costa

Reputation: 372

RES4<-dcast(RES3, ... ~ band,
            value.var="reflec")

Upvotes: 0

akrun
akrun

Reputation: 887048

The value.var should be a string that specifies the column name. According to ?dcast

value.var - name of column which stores values

and name of column is character string

By specifying it unquoted, it is searching for the column name from the values

dcast(RES3, X + Y + dummy ~ band,
         fun.aggregate = length,
         value.var="reflec")

Upvotes: 2

Related Questions