Jota
Jota

Reputation: 17611

Reshaping a dataframe from long to wide with irregular row lengths

I have a data frame like the following:

read.csv(text="num,placed,recovered
1,2013-02-22 12:14:00,2013-02-27 15:14:00
1,2013-03-03 17:32:00,2013-03-07 17:32:00
1,2013-04-24 10:13:00,2013-04-26 07:47:00
1,2013-04-15 14:51:00,2013-04-19 09:36:00
1,2013-04-11 11:56:00,2013-04-15 12:52:00
10,2013-02-22 07:30:00,2013-02-27 14:55:00
10,2013-03-03 17:20:00,2013-03-07 17:20:00
10,2013-04-15 15:22:00,2013-04-19 09:48:00
10,2013-02-17 10:38:00,2013-02-22 07:18:00
10,2013-04-11 10:09:00,2013-04-15 13:21:00
10,2013-04-24 10:07:00,2013-04-26 08:23:00
11,2013-02-22 14:23:00,2013-02-27 15:50:00
11,2013-04-11 12:51:00,2013-04-14 09:40:00
11,2013-04-15 14:45:00,2013-04-19 08:28:00
11,2013-04-19 10:13:00,2013-04-23 12:01:00
14,2013-03-01 13:45:00,2013-03-08 14:28:00
14,2013-02-22 13:22:00,2013-02-27 15:24:00
14,2013-04-04 15:36:00,2013-04-17 15:04:00",header=TRUE)

I'd like to rearrange it such that each entery in num appears once with all its placed and recovered values in one row. The following would be an example row:

num           placed1          recovered1             placed2          recovered2             placed3          recovered3             placed4          recovered4             placed5          recovered5
1 2013-02-22 12:14:00 2013-02-27 15:14:00 2013-03-03 17:32:00 2013-03-07 17:32:00 2013-04-24 10:13:00 2013-04-26 07:47:00 2013-04-15 14:51:00 2013-04-19 09:36:00 2013-04-11 11:56:00 2013-04-15 12:52:00

Some rows will have differing numbers of placed and recovered values. It's fine for NAs to appear in those spots. I've tried using the reshape function, but can't seem to get what I'm after.

I'm doing this as a step in subsetting a data set that I am cleaning up. The other dataset has measurements recorded over time along with the time of collection. The device on which the data was taken is stored in the num column. I want to take a subset of that data frame to acquire only the intervals for which that device was placed (the time between each pair of placed and recovered data). So, the other data frame would look something like the following:

num  temp time
1    5    2013-02-22 12:13:50
1    6    2013-02-22 12:14:00
1    4    2013-02-22 12:14:10
1    9    2013-04-24 09:45:20
1    7    2013-04-24 11:45:50
10   23   2013-03-03 19:23:40

and if I were able to successfully subset it the result would like like the following

num  temp time
1    6    2013-02-22 12:14:00
1    4    2013-02-22 12:14:10
1    7    2013-04-24 11:45:50
10   23   2013-03-03 19:23:40

Upvotes: 0

Views: 226

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193637

You just need a "time" variable to be included in your dataset for reshape to work properly:

mydf$time <- with(mydf, ave(num, num, FUN = seq_along))
head(mydf)
#   num              placed           recovered time
# 1   1 2013-02-22 12:14:00 2013-02-27 15:14:00    1
# 2   1 2013-03-03 17:32:00 2013-03-07 17:32:00    2
# 3   1 2013-04-24 10:13:00 2013-04-26 07:47:00    3
# 4   1 2013-04-15 14:51:00 2013-04-19 09:36:00    4
# 5   1 2013-04-11 11:56:00 2013-04-15 12:52:00    5
# 6  10 2013-02-22 07:30:00 2013-02-27 14:55:00    1
reshape(mydf, idvar="num", timevar="time", direction = "wide")
#    num            placed.1         recovered.1            placed.2         recovered.2
# 1    1 2013-02-22 12:14:00 2013-02-27 15:14:00 2013-03-03 17:32:00 2013-03-07 17:32:00
# 6   10 2013-02-22 07:30:00 2013-02-27 14:55:00 2013-03-03 17:20:00 2013-03-07 17:20:00
# 12  11 2013-02-22 14:23:00 2013-02-27 15:50:00 2013-04-11 12:51:00 2013-04-14 09:40:00
# 16  14 2013-03-01 13:45:00 2013-03-08 14:28:00 2013-02-22 13:22:00 2013-02-27 15:24:00
#               placed.3         recovered.3            placed.4         recovered.4
# 1  2013-04-24 10:13:00 2013-04-26 07:47:00 2013-04-15 14:51:00 2013-04-19 09:36:00
# 6  2013-04-15 15:22:00 2013-04-19 09:48:00 2013-02-17 10:38:00 2013-02-22 07:18:00
# 12 2013-04-15 14:45:00 2013-04-19 08:28:00 2013-04-19 10:13:00 2013-04-23 12:01:00
# 16 2013-04-04 15:36:00 2013-04-17 15:04:00                <NA>                <NA>
#               placed.5         recovered.5            placed.6         recovered.6
# 1  2013-04-11 11:56:00 2013-04-15 12:52:00                <NA>                <NA>
# 6  2013-04-11 10:09:00 2013-04-15 13:21:00 2013-04-24 10:07:00 2013-04-26 08:23:00
# 12                <NA>                <NA>                <NA>                <NA>
# 16                <NA>                <NA>                <NA>                <NA>

If you've added that "time" variable as I did above, you can also use the "reshape2" package after making an even longer dataset. That extra-long dataset (I've called it "mydf.l" below) might be more useful for subsetting than a wide dataset:

library(reshape2)
mydf.l <- melt(mydf, id.vars=c("num", "time"))
head(mydf.l)
#   num time variable               value
# 1   1    1   placed 2013-02-22 12:14:00
# 2   1    2   placed 2013-03-03 17:32:00
# 3   1    3   placed 2013-04-24 10:13:00
# 4   1    4   placed 2013-04-15 14:51:00
# 5   1    5   placed 2013-04-11 11:56:00
# 6  10    1   placed 2013-02-22 07:30:00
dcast(mydf.l, num ~ variable + time)
#   num            placed_1            placed_2            placed_3            placed_4
# 1   1 2013-02-22 12:14:00 2013-03-03 17:32:00 2013-04-24 10:13:00 2013-04-15 14:51:00
# 2  10 2013-02-22 07:30:00 2013-03-03 17:20:00 2013-04-15 15:22:00 2013-02-17 10:38:00
# 3  11 2013-02-22 14:23:00 2013-04-11 12:51:00 2013-04-15 14:45:00 2013-04-19 10:13:00
# 4  14 2013-03-01 13:45:00 2013-02-22 13:22:00 2013-04-04 15:36:00                <NA>
#              placed_5            placed_6         recovered_1         recovered_2
# 1 2013-04-11 11:56:00                <NA> 2013-02-27 15:14:00 2013-03-07 17:32:00
# 2 2013-04-11 10:09:00 2013-04-24 10:07:00 2013-02-27 14:55:00 2013-03-07 17:20:00
# 3                <NA>                <NA> 2013-02-27 15:50:00 2013-04-14 09:40:00
# 4                <NA>                <NA> 2013-03-08 14:28:00 2013-02-27 15:24:00
#           recovered_3         recovered_4         recovered_5         recovered_6
# 1 2013-04-26 07:47:00 2013-04-19 09:36:00 2013-04-15 12:52:00                <NA>
# 2 2013-04-19 09:48:00 2013-02-22 07:18:00 2013-04-15 13:21:00 2013-04-26 08:23:00
# 3 2013-04-19 08:28:00 2013-04-23 12:01:00                <NA>                <NA>
# 4 2013-04-17 15:04:00                <NA>                <NA>                <NA>

Upvotes: 2

Related Questions