Reputation: 1
I am working on a project, where I have a file which is in long format with 131170 objects and 4 variables; not all the values are numeric , and I have been trying to use the dcast function from reshape2 but how ever I try it gives me the error that the Aggregation function missing: defaulting to length. I do not want my data to be changed I simply want to change the format of the file.
This is the function I have written
W_data1 <- dcast(L_data1, formula = ID + Date ~ Metric, value.var = "Value")
This is an example of how my file looks like.
ID Date Metric Value
1003 3/5/2001 Age 74
1003 3/5/2001 Age 74
1003 3/5/2001 Age 74
1003 3/5/2001 Age 74
1003 3/5/2001 Sex F
1003 3/5/2001 Sex F
1003 3/5/2001 Sex F
1003 3/5/2001 Sex F
1003 3/5/2001 Dx MM
1003 3/5/2001 Dx MM
1003 3/5/2001 Dx MM
1003 3/5/2001 Dx MM
1003 3/5/2001 ISS.Stage 1
The wide format should look like this:
ID Age Sex Dx Date ISS Stage Heavy Chain Isotype
1003 74 F MM 3/5/2001 1 IgA
1003 74 F MM 3/5/2001 1 IgA
1003 74 F MM 3/5/2001 1 IgA
1003 74 F MM 3/5/2001 1 IgA
1004 79 F MM 1/1/1997 Unknown N/A
there are multiple data for each ID some may have 4 sets of data and others just one. the reason why the ID's are repeating is because the same variables have different values on different dates for the same ID.
Upvotes: 0
Views: 589
Reputation: 887951
You could also try with reshape
L_data1$Metric <- with(L_data1,
paste0(Metric,ave(seq_along(Metric), ID, Date, Metric, FUN = seq_along)))
res <- reshape(L_data1, timevar="Metric", idvar=c("ID", "Date"), direction="wide")
colnames(res) <- gsub("^[[:alpha:]]+\\.","",colnames(res))
res
# ID Date Age1 Age2 Age3 Age4 Sex1 Sex2 Sex3 Sex4 Dx1 Dx2 Dx3 Dx4
#1 1003 3/5/2001 74 74 74 74 F F F F MM MM MM MM
# ISS.Stage1
#1 1
Upvotes: 0
Reputation: 263481
You are constructing a situation where there are more than one item to stick into one place, so dcast is asking how to assemble them. If you wanted just the first one then build a function that does that as the "aggregation function".:
W_data1 <- dcast(L_data1, formula = ID + Date ~ Metric,
fun.aggregate=function(x){ as.character(x)[1] },
value.var = "Value")
W_data1
#---------------------
ID Date Age Dx ISS.Stage Sex
1 1003 3/5/2001 74 MM 1 F
Upvotes: 1
Reputation: 193687
There are duplicated values in the combination of your LHS and RHS variables. You need to add an indicator variable to distinguish between the unique values if you don't want dcast
to resort to length
.
Try:
L_data1$ind <- ave(1:nrow(L_data1), L_data1[1:3], FUN = seq_along)
dcast(L_data1, ID + Date ~ Metric + ind, value.var = "Value")
# ID Date Age_1 Age_2 Age_3 Age_4 Dx_1 Dx_2 Dx_3 Dx_4 ISS.Stage_1
# 1 1003 3/5/2001 74 74 74 74 MM MM MM MM 1
# Sex_1 Sex_2 Sex_3 Sex_4
# 1 F F F F
Upvotes: 1