Reputation: 2051
I have the following wide format dataframe:
df<-structure(list(ID = c(1, 2, 3), A.1 = c(0, 1, 0), A.2 = c(1,
1, 0), B.1 = c(99, 99, 0), B.2 = c(99, 99, 0)), .Names = c("ID",
"A.1", "A.2", "B.1", "B.2"), row.names = c(NA, 3L), class = "data.frame")
> df
ID A.1 A.2 B.1 B.2
1 1 0 1 99 99
2 2 1 1 99 99
3 3 0 0 0 0
Now I am changing it to long format:
long.df<-reshape (df,idvar = "ID", varying=c(2:5),v.names= c("A", "B"),
timevar="time",direction="long", sep = ".")
Here's the resulting long data frame:
> long.df
ID time A B
1.1 1 1 0 1
2.1 2 1 1 1
3.1 3 1 0 0
1.2 1 2 99 99
2.2 2 2 99 99
3.2 3 2 0 0
So, this conversion is not correct and the values became confused. For example, the value of parameter B for ID 1 for timepoint 1 changed from 99 to 1, the value of parameter A became 99 for IDs 1 and 2 at second timepoint and so on.
The expected output should be as follows:
> expected.long.df
ID time A B
1.1 1 1 0 99
2.1 2 1 1 99
3.1 3 1 0 0
1.2 1 2 1 99
2.2 2 2 1 99
3.2 3 2 0 0
Have no idea why it happened. Would be very greateful for any suggestion.
Upvotes: 1
Views: 132
Reputation: 1730
Try this. You're essentially looking at a melt
ing operation.
library(data.table)
df<-structure(list(ID = c(1, 2, 3), A.1 = c(0, 1, 0), A.2 = c(1, 1, 0), B.1 = c(99, 99, 0), B.2 = c(99, 99, 0)), .Names = c("ID", "A.1", "A.2", "B.1", "B.2"), row.names = c(NA, 3L), class = "data.frame")
dt <- setDT(df)
melt(dt, id = 'ID', measure = patterns('^A.', '^B.'), value.name = c('A', 'B'), variable.name = 'time')
ID time A B
1: 1 1 0 99
2: 2 1 1 99
3: 3 1 0 0
4: 1 2 1 99
5: 2 2 1 99
6: 3 2 0 0
Upvotes: 2
Reputation: 887118
The problem was in the varying
. We need to specify the patterns correctly
reshape(df, idvar = "ID", varying = list(grep("^A", names(df)),
grep("^B", names(df))), direction = "long", v.names = c("A", "B"))
# ID time A B
#1.1 1 1 0 99
#2.1 2 1 1 99
#3.1 3 1 0 0
#1.2 1 2 1 99
#2.2 2 2 1 99
#3.2 3 2 0 0
Upvotes: 4
Reputation: 323236
Base on you reshape
and stringr
:str_split_fixed
df=melt(df,'ID')
df[,c('Time','Name')]=str_split_fixed(as.character(df$variable),"[.]",2)
df$variable=NULL
reshape(df, idvar = c("ID","Name"), timevar = "Time", direction = "wide")
ID Name value.A value.B
1 1 1 0 99
2 2 1 1 99
3 3 1 0 0
4 1 2 1 99
5 2 2 1 99
6 3 2 0 0
Upvotes: 2
Reputation: 5068
I would use the tidyr
library:
library(tidyr)
temp1 = gather(df, key = "x", value = "y", -ID)
temp2 = separate(temp1, x, into = c("z", "time"), sep = "[.]")
temp3 = spread(temp2, key = z, value = y)
The temp3
table looks like your desired result, but not exactly the same order. Use dplyr
's arrange
to get it right:
> dplyr::arrange(temp3, time, ID)
ID time A B
1 1 1 0 99
2 2 1 1 99
3 3 1 0 0
4 1 2 1 99
5 2 2 1 99
6 3 2 0 0
Upvotes: 2