Ed G
Ed G

Reputation: 822

Reshaping an untidy data frame

*Editing in response to comments

I have a dataset that I am trying to prepare for analysis:

raw<-data.frame(
  name=c("Place 1", "Place 2", "Place 3", "Place 4"),
  x.1.Jan.12=c(1, NA, 0.5, NA),
  Jan.time=c("0900", NA, "0930", NA),
  x.15.Jan.12=c(NA, 0.7, NA, NA),
  Jan.time=c(NA, "1030", NA, NA),
  x.3.Feb.12=c(0.8, 0.6, 0.4, NA),
  Feb.time=c("0715", "0800", "0830", NA),
  x.8.Feb.12=c(NA, NA, 0.65, 0.33),
  Feb.time=c(NA, NA, "?", "1123")
  )

The data should be quite straightforward: a location with a result, a date for the result and the time it was collected. As you'll see, date has been used to name the variable containing the result. Each 'time' variable relates to the column before it - the first 'Jan.time' variable is the time of the results in 'x.1.Jan.12'

I want to restructure the data into four variables - name, date, time and value. I'm pretty sure reshape2 can do it and have got the data melted:

mDat<-melt(raw, id=c("name"))

Can't work out the next steps - probably to do with the wierd variable names.

The result I'd like is along these lines:

outData<-data.frame(
  name=c("Place 1", "Place 2", "Place 3", "Place 4", "Place 1", "Place 2", "Place 3", "Place 4", "Place 1", "Place 2", "Place 3", "Place 4", "Place 1", "Place 2", "Place 3", "Place 4"),
  date=c("1-Jan-12", "1-Jan-12", "1-Jan-12", "1-Jan-12", "15-Jan-12", "15-Jan-12", "15-Jan-12", "15-Jan-12", "3-Feb-12", "3-Feb-12", "3-Feb-12", "3-Feb-12", "8-Feb-12", "8-Feb-12", "8-Feb-12", "8-Feb-12"),
  value=c(1, NA, 0.5, NA, NA, 0.7, NA, NA, 0.8, 0.6, 0.4, NA, NA, NA, 0.65, 0.33),
  time=c("0900", NA, "0930", NA, NA, "1030", NA, NA, "0715", "0800", "0830", NA, NA, NA, "?", "1123")
)

Upvotes: 2

Views: 209

Answers (2)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

One option is to use melt() from "reshape2" on different subsets of your data.frame. The subsets can be extracted using grep().

library(reshape2)
temp <- cbind(
    setNames(melt(raw[c(1, grep("time", names(raw)))], id.vars="name"), 
             c("name", "mon.time", "time")),
    setNames(melt(raw[grep("time", names(raw), invert = TRUE)], id.vars="name"),
             c("name", "date", "result")))
temp[, c("name", "result", "time", "date")]
#       name result time        date
# 1  Place 1   1.00 0900  x.1.Jan.12
# 2  Place 2     NA <NA>  x.1.Jan.12
# 3  Place 3   0.50 0930  x.1.Jan.12
# 4  Place 4     NA <NA>  x.1.Jan.12
# 5  Place 1     NA <NA> x.15.Jan.12
# 6  Place 2   0.70 1030 x.15.Jan.12
# 7  Place 3     NA <NA> x.15.Jan.12
# 8  Place 4     NA <NA> x.15.Jan.12
# 9  Place 1   0.80 0715  x.3.Feb.12
# 10 Place 2   0.60 0800  x.3.Feb.12
# 11 Place 3   0.40 0830  x.3.Feb.12
# 12 Place 4     NA <NA>  x.3.Feb.12
# 13 Place 1     NA <NA>  x.8.Feb.12
# 14 Place 2     NA <NA>  x.8.Feb.12
# 15 Place 3   0.65    ?  x.8.Feb.12
# 16 Place 4   0.33 1123  x.8.Feb.12

Upvotes: 1

Ed G
Ed G

Reputation: 822

A new day often helps. I've managed to work up a non reshape solution, but it uses a dreaded for loop:

subList<-list()
for(i in seq(2,8,2)){
  temp<-raw[c(1, i, i+1)]
  temp$date<-rep(names(temp)[2], nrow(temp))
  names(temp)<-c("name", "result", "time", "date")
  subList[[i/2]]<-temp
}

solution1<-do.call("rbind", subList)

Upvotes: 0

Related Questions