Reputation: 6796
I have a data.frame like below.
toolid startdate enddate stage
abc 1-Jan-13 5-Jan-13 production
abc 6-Jan-13 10-Jan-13 down
xyz 3-Jan-13 8-Jan-13 production
xyz 9-Jan-13 15-Jan-13 down
I want to transform the data.frame into the format below. I am trying to combine columns 'startdate'
and 'enddate'
from above data.frame into a single column called 'date'
below. Original data that I have has over a few thousand rows across many toolids
and many stages. I already found a way to do this using SQL, but would prefer a R solution. I have started by melting the data as shown in the code below.
toolid date stage
abc 1-Jan-13 production
abc 2-Jan-13 production
abc 3-Jan-13 production
abc 4-Jan-13 production
abc 5-Jan-13 production
abc 6-Jan-13 down
abc 7-Jan-13 down
abc 8-Jan-13 down
abc 9-Jan-13 down
abc 10-Jan-13 down
xyz 3-Jan-13 production
xyz 4-Jan-13 production
xyz 5-Jan-13 production
xyz 6-Jan-13 production
xyz 7-Jan-13 production
xyz 8-Jan-13 production
xyz 9-Jan-13 down
xyz 10-Jan-13 down
xyz 11-Jan-13 down
xyz 12-Jan-13 down
xyz 13-Jan-13 down
xyz 14-Jan-13 down
xyz 15-Jan-13 down
R code
startdate=c('1-Jan-13','6-Jan-13','3-Jan-13','9-Jan-13')
enddate=c('5-Jan-13', '10-Jan-13', '8-Jan-13', '15-Jan-13')
toolid=c('abc', 'abc', 'xyz', 'xyz')
stage=c('production', 'down', 'production', 'down')
data=data.frame(toolid,startdate,enddate,stage)
require(reshape2)
newdata=melt(data,id.vars=c('toolid','stage'))
update: coping code from @Ananda Mahto answer below and adding few lines of code to give a pivot table kind of output
## Convert "startdate" and "enddate" to date objects
data$startdate <- as.Date(data$startdate, format="%d-%b-%y")
data$enddate <- as.Date(data$enddate, format="%d-%b-%y")
## Use `seq` to create the date sequence, and manually recreate
## your dataframe. `do.call(rbind, ...) to put it back together
ddd=do.call(rbind, lapply(sequence(nrow(data)), function(x) {
data.frame(toolid = data$toolid[x],
date = seq(data$startdate[x], data$enddate[x], by = 1),
stage = data$stage[x])
}))
ddd
toolid date stage
1 abc 2013-01-01 production
2 abc 2013-01-02 production
3 abc 2013-01-03 production
4 abc 2013-01-04 production
5 abc 2013-01-05 production
6 abc 2013-01-06 down
7 abc 2013-01-07 down
8 abc 2013-01-08 down
9 abc 2013-01-09 down
10 abc 2013-01-10 down
11 xyz 2013-01-03 production
12 xyz 2013-01-04 production
13 xyz 2013-01-05 production
14 xyz 2013-01-06 production
15 xyz 2013-01-07 production
16 xyz 2013-01-08 production
17 xyz 2013-01-09 down
18 xyz 2013-01-10 down
19 xyz 2013-01-11 down
20 xyz 2013-01-12 down
21 xyz 2013-01-13 down
22 xyz 2013-01-14 down
23 xyz 2013-01-15 down
ddd1=dcast(ddd,date~stage)
ddd1
date down production
1 2013-01-01 0 1
2 2013-01-02 0 1
3 2013-01-03 0 2
4 2013-01-04 0 2
5 2013-01-05 0 2
6 2013-01-06 1 1
7 2013-01-07 1 1
8 2013-01-08 1 1
9 2013-01-09 2 0
10 2013-01-10 2 0
11 2013-01-11 1 0
12 2013-01-12 1 0
13 2013-01-13 1 0
14 2013-01-14 1 0
15 2013-01-15 1 0
Upvotes: 0
Views: 336
Reputation: 193677
I'm sure there are more "correct" ways to do this, but this is what came to my mind quickly.
First, convert "startdate" and "enddate" to date objects
data$startdate <- as.Date(data$startdate, format="%d-%b-%y")
data$enddate <- as.Date(data$enddate, format="%d-%b-%y")
Then, use seq
to create the date sequence, and manually recreate your data.frame
. Use `do.call(rbind, ...) to put it back together.
ddd <- do.call(rbind, lapply(sequence(nrow(data)), function(x) {
data.frame(toolid = data$toolid[x],
date = seq(data$startdate[x], data$enddate[x], by = 1),
stage = data$stage[x])
}))
ddd
# toolid date stage
# 1 abc 2013-01-01 production
# 2 abc 2013-01-02 production
# 3 abc 2013-01-03 production
# 4 abc 2013-01-04 production
# 5 abc 2013-01-05 production
# 6 abc 2013-01-06 down
# 7 abc 2013-01-07 down
# 8 abc 2013-01-08 down
# 9 abc 2013-01-09 down
# 10 abc 2013-01-10 down
# 11 xyz 2013-01-03 production
# 12 xyz 2013-01-04 production
# 13 xyz 2013-01-05 production
# 14 xyz 2013-01-06 production
# 15 xyz 2013-01-07 production
# 16 xyz 2013-01-08 production
# 17 xyz 2013-01-09 down
# 18 xyz 2013-01-10 down
# 19 xyz 2013-01-11 down
# 20 xyz 2013-01-12 down
# 21 xyz 2013-01-13 down
# 22 xyz 2013-01-14 down
# 23 xyz 2013-01-15 down
Finally, looking at where you say you want to end up, you can stick to base R all the way and use table
. I've put it in as.data.frame.matrix()
because I've assumed you want a data.frame
as the result:
as.data.frame.matrix(table(ddd[-1]))
# down production
# 2013-01-01 0 1
# 2013-01-02 0 1
# 2013-01-03 0 2
# 2013-01-04 0 2
# 2013-01-05 0 2
# 2013-01-06 1 1
# 2013-01-07 1 1
# 2013-01-08 1 1
# 2013-01-09 2 0
# 2013-01-10 2 0
# 2013-01-11 1 0
# 2013-01-12 1 0
# 2013-01-13 1 0
# 2013-01-14 1 0
# 2013-01-15 1 0
Upvotes: 4