Reputation: 1823
I've like to use dcast function (reshape2 package) for returns a reshape data frame, but doesn't work. In my example:
#Data set
X<-c(804519.4,804519.6,804519.6,804519.4,804519.4,804519.4,804519.6,804519.6,804519.4,804519.4)
Y<-c(7673833,7673833,7673833,7673833,7673833,7673833,7673833,7673833,7673833,7673833)
band<-c("band1","band1","band1","band1","band1","band2","band2","band2","band2","band2")# My original data set are 31 bands
reflec<-c(9.608848,10.504454,8.648237,9.935091,11.282750,9.608848,10.504454,8.648237,9.935091,11.282750)
dummy<-1:10
RES3<-data.frame(X,Y,band,reflec,dummy)
RES3
X Y band reflec dummy
1 804519.4 7673833 band1 9.608848 1
2 804519.6 7673833 band1 10.504454 2
3 804519.6 7673833 band1 8.648237 3
4 804519.4 7673833 band1 9.935091 4
5 804519.4 7673833 band1 11.282750 5
6 804519.4 7673833 band2 9.608848 6
7 804519.6 7673833 band2 10.504454 7
8 804519.6 7673833 band2 8.648237 8
9 804519.4 7673833 band2 9.935091 9
10 804519.4 7673833 band2 11.282750 10
RES3<-as.data.frame(RES3)
colnames(RES3)<-c("X","Y","band","reflec","dummy")
dcast(RES3, X + Y + dummy ~ band,
fun.aggregate = length,
value.var="reflec")
Doesn't work, my output is:
X Y dummy band1 band2
1 804519.4 7673833 1 1 0
2 804519.4 7673833 4 1 0
3 804519.4 7673833 5 1 0
4 804519.4 7673833 6 0 1
5 804519.4 7673833 9 0 1
6 804519.4 7673833 10 0 1
7 804519.6 7673833 2 1 0
8 804519.6 7673833 3 1 0
9 804519.6 7673833 7 0 1
10 804519.6 7673833 8 0 1
I expected:
X Y band1 band2
1 804519.4 7673833 9.608848 9.608848
2 804519.6 7673833 10.504454 10.504454
3 804519.6 7673833 8.648237 8.648237
4 804519.4 7673833 9.935091 9.935091
5 804519.4 7673833 11.282750 11.282750
Any member could help me, because my original data set are 31 bands as levels and I've like to convert in columns? Thanks!
Upvotes: 2
Views: 308
Reputation: 16832
Like I mentioned, reshape2
has been deprecated in favor of tidyr
within the tidyverse
packages. In my opinion (and that of the packages' authors), tidyr
's spread
and gather
are a bit more clear than reshape2
's cast
and melt
—no formula notation, cleaner ways to specify values. (Some context for that is here).
Also as I mentioned, you've got a couple extra lines—data.frame
will add the column names according to the names of the vectors going into it.
I've updated this answer to match the new data you posted. My original solution worked for your original data, but needs some more steps for your new data, which I'm doing with dplyr
functions.
I don't totally get the dummy
column at this point, since it's not in your expected output. I'm dropping it with dplyr::select(-dummy)
. One tricky thing with tidyr::spread
is that you need some way to mark rows uniquely—annoying, but also prevents errors in reshaping the data. So I'm grouping by band
, then adding row numbers as follows:
library(tidyr)
library(dplyr)
res3 <- data.frame(X, Y, band, reflec, dummy)
res3 %>%
select(-dummy) %>%
group_by(band) %>%
mutate(row = row_number())
#> # A tibble: 10 x 5
#> # Groups: band [2]
#> X Y band reflec row
#> <dbl> <dbl> <fct> <dbl> <int>
#> 1 804519. 7673833 band1 9.61 1
#> 2 804520. 7673833 band1 10.5 2
#> 3 804520. 7673833 band1 8.65 3
#> 4 804519. 7673833 band1 9.94 4
#> 5 804519. 7673833 band1 11.3 5
#> 6 804519. 7673833 band2 9.61 1
#> 7 804520. 7673833 band2 10.5 2
#> 8 804520. 7673833 band2 8.65 3
#> 9 804519. 7673833 band2 9.94 4
#> 10 804519. 7673833 band2 11.3 5
That way, row 1 will have a band 1 value and a band 2 value, and so on. Then I call spread
with the band as the key to become columns and the reflec values to populate those columns, and finally drop the row number column.
res3 %>%
select(-dummy) %>%
group_by(band) %>%
mutate(row = row_number()) %>%
spread(key = band, value = reflec) %>%
select(-row)
#> # A tibble: 5 x 4
#> X Y band1 band2
#> <dbl> <dbl> <dbl> <dbl>
#> 1 804519. 7673833 9.61 9.61
#> 2 804519. 7673833 9.94 9.94
#> 3 804519. 7673833 11.3 11.3
#> 4 804520. 7673833 10.5 10.5
#> 5 804520. 7673833 8.65 8.65
Created on 2019-01-28 by the reprex package (v0.2.1)
Upvotes: 3
Reputation: 887048
The value.var
should be a string that specifies the column name. According to ?dcast
value.var - name of column which stores values
and name
of column is character string
By specifying it unquoted, it is searching for the column name from the values
dcast(RES3, X + Y + dummy ~ band,
fun.aggregate = length,
value.var="reflec")
Upvotes: 2