Reputation: 13
I am a beginner R user so apologies if this is not quite fluent. I want to spread my data so that all variables for one Sample_file is in one row. My data (RW_leftjoin) currently looks something like this:
Sample_File Marker Peak Allele Height
1: A02_1710963103.fsa AMEL 1 X 5137
2: A02_1710963103.fsa AMEL 2 Y 4898
3: A02_1710963103.fsa CSF1PO 1 11 805
4: A02_1710963103.fsa CSF1PO 2 12 652
I want my data to look like this:
Sample_File AMEL1 AMEL2 Height1 Height2 CSF1PO1 CSF1PO1 Height1 Height2
1: A02_1710963103.fsa X Y 5137 4898 11 12 805 652
Is that possible using R?
I have tried using this:
RW_spread <- RW_leftjoin %>%
rowid_to_column() %>%
group_by(Sample_File, Marker) %>%
mutate(ID = paste0(Marker, Peak)) %>%
ungroup() %>%
spread(ID, Allele)
but the data looks like this:
rowid Sample_File Marker Peak Height AMEL1 AMEL2 CSF1PO1 CSF1PO2
<int> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
1 1 A02_1710963103.fsa AMEL 1 5137 X NA NA NA
2 2 A02_1710963103.fsa AMEL 2 4898 NA Y NA NA
3 3 A02_1710963103.fsa CSF1PO 1 805 NA NA 11 NA
4 4 A02_1710963103.fsa CSF1PO 2 652 NA NA NA 12
Would appreciate advice on this.
Upvotes: 1
Views: 628
Reputation: 6234
Note that tidyr
's current development version (0.8.3.900) contains the function pivot_wider
, which casts the data.frame to the desired wide format in a single function call (see also the vignette at Tidyr: Pivoting):
library(tidyr)
pivot_wider(df, names_from = c("Marker", "Peak"), values_from = c("Allele", "Height"))
#> Sample_File Allele_AMEL_1 Allele_AMEL_2 Allele_CSF1PO_1
#> 1 A02_1710963103.fsa X Y 11
#> Allele_CSF1PO_2 Height_AMEL_1 Height_AMEL_2 Height_CSF1PO_1
#> 1 12 5137 4898 805
#> Height_CSF1PO_2
#> 1 652
packageVersion("tidyr")
#> [1] '0.8.3.9000'
Data
df <- structure(list(Sample_File = c("A02_1710963103.fsa", "A02_1710963103.fsa",
"A02_1710963103.fsa", "A02_1710963103.fsa"), Marker = c("AMEL",
"AMEL", "CSF1PO", "CSF1PO"), Peak = c(1L, 2L, 1L, 2L), Allele = c("X",
"Y", "11", "12"), Height = c(5137L, 4898L, 805L, 652L)), row.names = c(NA,
-4L), class = "data.frame")
Upvotes: 0
Reputation: 852
As I personally prefer data.table
to the tidyverse
, I would do
dcast(df,Sample_File~Marker+Peak,value.var=c("Allele","Height"))
what that command does is grouping the samples by Marker and Peak (Sample_File~Marker+Peak
) and fill the table with values from Allele and Height (value.var=c("Allele","Height")
)
Upvotes: 0
Reputation: 34521
A tidyr
approach is to gather
the Allele
and Height
variables into a single column and create a key variable with the remaining columns (excluding the variable serving as an id) using unite
and then spread
ing the key/value pair.
library(tidyr)
RW_leftjoin %>%
gather(key, value, Allele, Height) %>%
unite(tmp, c("Marker", "Peak", "key")) %>%
spread(tmp, value)
Sample_File AMEL_1_Allele AMEL_1_Height AMEL_2_Allele AMEL_2_Height CSF1PO_1_Allele CSF1PO_1_Height CSF1PO_2_Allele CSF1PO_2_Height
1 A02_1710963103.fsa X 5137 Y 4898 11 805 12 652
Upvotes: 2