Reputation: 17611
I have a data frame like the following:
read.csv(text="num,placed,recovered
1,2013-02-22 12:14:00,2013-02-27 15:14:00
1,2013-03-03 17:32:00,2013-03-07 17:32:00
1,2013-04-24 10:13:00,2013-04-26 07:47:00
1,2013-04-15 14:51:00,2013-04-19 09:36:00
1,2013-04-11 11:56:00,2013-04-15 12:52:00
10,2013-02-22 07:30:00,2013-02-27 14:55:00
10,2013-03-03 17:20:00,2013-03-07 17:20:00
10,2013-04-15 15:22:00,2013-04-19 09:48:00
10,2013-02-17 10:38:00,2013-02-22 07:18:00
10,2013-04-11 10:09:00,2013-04-15 13:21:00
10,2013-04-24 10:07:00,2013-04-26 08:23:00
11,2013-02-22 14:23:00,2013-02-27 15:50:00
11,2013-04-11 12:51:00,2013-04-14 09:40:00
11,2013-04-15 14:45:00,2013-04-19 08:28:00
11,2013-04-19 10:13:00,2013-04-23 12:01:00
14,2013-03-01 13:45:00,2013-03-08 14:28:00
14,2013-02-22 13:22:00,2013-02-27 15:24:00
14,2013-04-04 15:36:00,2013-04-17 15:04:00",header=TRUE)
I'd like to rearrange it such that each entery in num
appears once with all its placed
and recovered
values in one row. The following would be an example row:
num placed1 recovered1 placed2 recovered2 placed3 recovered3 placed4 recovered4 placed5 recovered5
1 2013-02-22 12:14:00 2013-02-27 15:14:00 2013-03-03 17:32:00 2013-03-07 17:32:00 2013-04-24 10:13:00 2013-04-26 07:47:00 2013-04-15 14:51:00 2013-04-19 09:36:00 2013-04-11 11:56:00 2013-04-15 12:52:00
Some rows will have differing numbers of placed and recovered values. It's fine for NA
s to appear in those spots. I've tried using the reshape function, but can't seem to get what I'm after.
I'm doing this as a step in subsetting a data set that I am cleaning up. The other dataset has measurements recorded over time along with the time of collection. The device on which the data was taken is stored in the num
column. I want to take a subset of that data frame to acquire only the intervals for which that device was placed (the time between each pair of placed
and recovered
data). So, the other data frame would look something like the following:
num temp time
1 5 2013-02-22 12:13:50
1 6 2013-02-22 12:14:00
1 4 2013-02-22 12:14:10
1 9 2013-04-24 09:45:20
1 7 2013-04-24 11:45:50
10 23 2013-03-03 19:23:40
and if I were able to successfully subset it the result would like like the following
num temp time
1 6 2013-02-22 12:14:00
1 4 2013-02-22 12:14:10
1 7 2013-04-24 11:45:50
10 23 2013-03-03 19:23:40
Upvotes: 0
Views: 226
Reputation: 193637
You just need a "time" variable to be included in your dataset for reshape
to work properly:
mydf$time <- with(mydf, ave(num, num, FUN = seq_along))
head(mydf)
# num placed recovered time
# 1 1 2013-02-22 12:14:00 2013-02-27 15:14:00 1
# 2 1 2013-03-03 17:32:00 2013-03-07 17:32:00 2
# 3 1 2013-04-24 10:13:00 2013-04-26 07:47:00 3
# 4 1 2013-04-15 14:51:00 2013-04-19 09:36:00 4
# 5 1 2013-04-11 11:56:00 2013-04-15 12:52:00 5
# 6 10 2013-02-22 07:30:00 2013-02-27 14:55:00 1
reshape(mydf, idvar="num", timevar="time", direction = "wide")
# num placed.1 recovered.1 placed.2 recovered.2
# 1 1 2013-02-22 12:14:00 2013-02-27 15:14:00 2013-03-03 17:32:00 2013-03-07 17:32:00
# 6 10 2013-02-22 07:30:00 2013-02-27 14:55:00 2013-03-03 17:20:00 2013-03-07 17:20:00
# 12 11 2013-02-22 14:23:00 2013-02-27 15:50:00 2013-04-11 12:51:00 2013-04-14 09:40:00
# 16 14 2013-03-01 13:45:00 2013-03-08 14:28:00 2013-02-22 13:22:00 2013-02-27 15:24:00
# placed.3 recovered.3 placed.4 recovered.4
# 1 2013-04-24 10:13:00 2013-04-26 07:47:00 2013-04-15 14:51:00 2013-04-19 09:36:00
# 6 2013-04-15 15:22:00 2013-04-19 09:48:00 2013-02-17 10:38:00 2013-02-22 07:18:00
# 12 2013-04-15 14:45:00 2013-04-19 08:28:00 2013-04-19 10:13:00 2013-04-23 12:01:00
# 16 2013-04-04 15:36:00 2013-04-17 15:04:00 <NA> <NA>
# placed.5 recovered.5 placed.6 recovered.6
# 1 2013-04-11 11:56:00 2013-04-15 12:52:00 <NA> <NA>
# 6 2013-04-11 10:09:00 2013-04-15 13:21:00 2013-04-24 10:07:00 2013-04-26 08:23:00
# 12 <NA> <NA> <NA> <NA>
# 16 <NA> <NA> <NA> <NA>
If you've added that "time" variable as I did above, you can also use the "reshape2" package after making an even longer dataset. That extra-long dataset (I've called it "mydf.l" below) might be more useful for subsetting than a wide dataset:
library(reshape2)
mydf.l <- melt(mydf, id.vars=c("num", "time"))
head(mydf.l)
# num time variable value
# 1 1 1 placed 2013-02-22 12:14:00
# 2 1 2 placed 2013-03-03 17:32:00
# 3 1 3 placed 2013-04-24 10:13:00
# 4 1 4 placed 2013-04-15 14:51:00
# 5 1 5 placed 2013-04-11 11:56:00
# 6 10 1 placed 2013-02-22 07:30:00
dcast(mydf.l, num ~ variable + time)
# num placed_1 placed_2 placed_3 placed_4
# 1 1 2013-02-22 12:14:00 2013-03-03 17:32:00 2013-04-24 10:13:00 2013-04-15 14:51:00
# 2 10 2013-02-22 07:30:00 2013-03-03 17:20:00 2013-04-15 15:22:00 2013-02-17 10:38:00
# 3 11 2013-02-22 14:23:00 2013-04-11 12:51:00 2013-04-15 14:45:00 2013-04-19 10:13:00
# 4 14 2013-03-01 13:45:00 2013-02-22 13:22:00 2013-04-04 15:36:00 <NA>
# placed_5 placed_6 recovered_1 recovered_2
# 1 2013-04-11 11:56:00 <NA> 2013-02-27 15:14:00 2013-03-07 17:32:00
# 2 2013-04-11 10:09:00 2013-04-24 10:07:00 2013-02-27 14:55:00 2013-03-07 17:20:00
# 3 <NA> <NA> 2013-02-27 15:50:00 2013-04-14 09:40:00
# 4 <NA> <NA> 2013-03-08 14:28:00 2013-02-27 15:24:00
# recovered_3 recovered_4 recovered_5 recovered_6
# 1 2013-04-26 07:47:00 2013-04-19 09:36:00 2013-04-15 12:52:00 <NA>
# 2 2013-04-19 09:48:00 2013-02-22 07:18:00 2013-04-15 13:21:00 2013-04-26 08:23:00
# 3 2013-04-19 08:28:00 2013-04-23 12:01:00 <NA> <NA>
# 4 2013-04-17 15:04:00 <NA> <NA> <NA>
Upvotes: 2