shah1
shah1

Reputation: 1

Reshaping data in R from long to wide format without aggregation

I am working on a project, where I have a file which is in long format with 131170 objects and 4 variables; not all the values are numeric , and I have been trying to use the dcast function from reshape2 but how ever I try it gives me the error that the Aggregation function missing: defaulting to length. I do not want my data to be changed I simply want to change the format of the file.

This is the function I have written

W_data1 <- dcast(L_data1, formula = ID + Date ~ Metric, value.var = "Value") 

This is an example of how my file looks like.

ID     Date       Metric     Value
1003    3/5/2001    Age       74
1003    3/5/2001    Age       74
1003    3/5/2001    Age       74
1003    3/5/2001    Age       74
1003    3/5/2001    Sex        F
1003    3/5/2001    Sex        F
1003    3/5/2001    Sex        F
1003    3/5/2001    Sex        F
1003    3/5/2001    Dx         MM
1003    3/5/2001    Dx         MM
1003    3/5/2001    Dx         MM
1003    3/5/2001    Dx         MM
1003    3/5/2001    ISS.Stage   1

The wide format should look like this:

ID      Age Sex Dx  Date       ISS Stage    Heavy Chain Isotype
1003    74  F   MM  3/5/2001    1           IgA
1003    74  F   MM  3/5/2001    1           IgA
1003    74  F   MM  3/5/2001    1           IgA
1003    74  F   MM  3/5/2001    1           IgA
1004    79  F   MM  1/1/1997    Unknown     N/A

there are multiple data for each ID some may have 4 sets of data and others just one. the reason why the ID's are repeating is because the same variables have different values on different dates for the same ID.

Upvotes: 0

Views: 589

Answers (3)

akrun
akrun

Reputation: 887951

You could also try with reshape

  L_data1$Metric <- with(L_data1, 
             paste0(Metric,ave(seq_along(Metric), ID, Date, Metric, FUN = seq_along)))

  res <- reshape(L_data1, timevar="Metric", idvar=c("ID", "Date"), direction="wide")
  colnames(res) <- gsub("^[[:alpha:]]+\\.","",colnames(res))

  res
 #    ID     Date Age1 Age2 Age3 Age4 Sex1 Sex2 Sex3 Sex4 Dx1 Dx2 Dx3 Dx4
 #1 1003 3/5/2001   74   74   74   74    F    F    F    F  MM  MM  MM  MM
 #  ISS.Stage1
 #1          1

Upvotes: 0

IRTFM
IRTFM

Reputation: 263481

You are constructing a situation where there are more than one item to stick into one place, so dcast is asking how to assemble them. If you wanted just the first one then build a function that does that as the "aggregation function".:

W_data1 <- dcast(L_data1, formula = ID + Date ~ Metric, 
                 fun.aggregate=function(x){ as.character(x)[1] }, 
                 value.var = "Value")
 W_data1
#---------------------
    ID     Date Age Dx ISS.Stage Sex
1 1003 3/5/2001  74 MM         1   F

Upvotes: 1

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193687

There are duplicated values in the combination of your LHS and RHS variables. You need to add an indicator variable to distinguish between the unique values if you don't want dcast to resort to length.

Try:

L_data1$ind <- ave(1:nrow(L_data1), L_data1[1:3], FUN = seq_along)

dcast(L_data1, ID + Date ~ Metric + ind, value.var = "Value")
#     ID     Date Age_1 Age_2 Age_3 Age_4 Dx_1 Dx_2 Dx_3 Dx_4 ISS.Stage_1
# 1 1003 3/5/2001    74    74    74    74   MM   MM   MM   MM           1
#   Sex_1 Sex_2 Sex_3 Sex_4
# 1     F     F     F     F 

Upvotes: 1

Related Questions