Reputation: 1529
I have this following data:
df <- data.frame(
wave = c(350, 352),
cdom = c(0.164910664183534, 0.161336423973549),
total = c(0.173292853508359, 0.164541380243188),
wave_1 = c(350, 352),
cdom_1 = c(0.157738707282744, 0.149555740098184),
total_1 = c(0.16501632769282, 0.151631889636391),
wave_2 = c(350, 352),
cdom_2 = c(0.143293704793142, 0.133057094683334),
total_2 = c(0.148878497119496, 0.136150629840465),
wave_3 = c(350, 352),
cdom_3 = c(0.0972284241775975, 0.0906890150335725),
total_3 = c(0.108645612944463, 0.103640164204995),
wave_4 = c(350, 352),
cdom_4 = c(0.0801780489449968, 0.0779336395415438),
total_4 = c(0.103930690374372, 0.095768602460239))
df
#> wave cdom total wave_1 cdom_1 total_1 wave_2 cdom_2
#> 1 350 0.1649107 0.1732929 350 0.1577387 0.1650163 350 0.1432937
#> 2 352 0.1613364 0.1645414 352 0.1495557 0.1516319 352 0.1330571
#> total_2 wave_3 cdom_3 total_3 wave_4 cdom_4 total_4
#> 1 0.1488785 350 0.09722842 0.1086456 350 0.08017805 0.1039307
#> 2 0.1361506 352 0.09068902 0.1036402 352 0.07793364 0.0957686
Each triple column represents 1 sample. Hence, in this example, there are 5 samples. I would like to reshape this dataframe
to longer with 4 columns (sample
, wave
, cdom
and total
). The solution can be base R, dplyr
or data.table
as I do not have any preference.
Upvotes: 0
Views: 71
Reputation: 378
A complete R base way to do that:
df <- cbind(id=c("a", "b"), df)
ls <- list()
for (i in 1:3) {ls[i] <- list(3*(1:(ncol(df)/3))+i-2)}
reshape(df, direction = "long", idvar = "id", varying = ls)
OUTPUT
id time wave cdrom total
a.1 a 1 350 0.16491066 0.1732929
b.1 b 1 352 0.16133642 0.1645414
a.2 a 2 350 0.15773871 0.1650163
b.2 b 2 352 0.14955574 0.1516319
a.3 a 3 350 0.14329370 0.1488785
b.3 b 3 352 0.13305709 0.1361506
a.4 a 4 350 0.09722842 0.1086456
b.4 b 4 352 0.09068902 0.1036402
a.5 a 5 350 0.08017805 0.1039307
b.5 b 5 352 0.07793364 0.0957686
Upvotes: 0
Reputation: 2435
using data.table:
library("data.table")
df <- data.table(df)
df[1, id := "a"]
df[2, id := "b"]
melt(df, id.vars = "id" ,
measure.vars = patterns(wave = "wave",
cdom = "cdom",
total = "total"))
id variable wave cdom total
1: a 1 350 0.16491066 0.1732929
2: b 1 352 0.16133642 0.1645414
3: a 2 350 0.15773871 0.1650163
4: b 2 352 0.14955574 0.1516319
5: a 3 350 0.14329370 0.1488785
6: b 3 352 0.13305709 0.1361506
7: a 4 350 0.09722842 0.1086456
8: b 4 352 0.09068902 0.1036402
9: a 5 350 0.08017805 0.1039307
10: b 5 352 0.07793364 0.0957686
Upvotes: 2
Reputation: 10761
This takes a little bit of work, since the columns aren't formatted uniformly, and you have multiple observations per "sample". We can use functions from the tidyverse
package:
library(tidyverse)
df %>%
# create column for observation order
mutate(rn = row_number()) %>%
# wide to long
gather(variable, value, -rn) %>%
# separate variable columns (e.g. wave_1 -> wave, 1)
separate(variable, into = c("var", "sample")) %>%
# fix NA for "first" sample
mutate(sample = ifelse(is.na(sample), 0, sample)) %>%
# reshape from long to a little less long
group_by(rn, sample) %>%
spread(var, value)
# rn sample cdom total wave
# <int> <chr> <dbl> <dbl> <dbl>
# 1 1 0 0.165 0.173 350
# 2 1 1 0.158 0.165 350
# 3 1 2 0.143 0.149 350
# 4 1 3 0.0972 0.109 350
# 5 1 4 0.0802 0.104 350
# 6 2 0 0.161 0.165 352
# 7 2 1 0.150 0.152 352
# 8 2 2 0.133 0.136 352
# 9 2 3 0.0907 0.104 352
# 10 2 4 0.0779 0.0958 352
Upvotes: 1