Reputation: 13
I have a row of data that I would like to break out into 5 rows based on the value.
Attached is an image of what I'd like to achieve. I don't know where to start. The cost is divided evenly over the 7 days.
UPDATE: I am now able to do this using the code below: but it is giving me an error:
1) it adds a last column called NA 2) it replaces all the values in for the first Var as NA
`df<-read.csv("C:\\TEST1.CSV")
df1<-df[FALSE,];
#For every week
for (i in 1:dim(df)[1]){
#For all seven days
for (j in 1:7){
df1<-rbind(df1,df[i,]/7);
}
}
df2<-data.frame(seq(as.Date("2011-05-02"), as.Date("2015-11-22"), by="days"),df1);
colnames(df2)<-colnames(df)
write.csv(df2, file = "C:TEST1RES.CSV")`
error observed: in the output file, the first column of variable, i.e col 3 is always NA. How could this happen?
Upvotes: 1
Views: 5821
Reputation: 1237
You can use a simple loop. by using [FALSE,]
and dim(df)[1]
, my script will adapt to bigger dimensions.
#Your original dataset
df<-matrix(c(5,3,1,6,4,2),2,3);
#New dataset
df1<-df[FALSE,];
#For every week
for (i in 1:dim(df)[1]){
#For all seven days
for (j in 1:7){
df1<-rbind(df1,df[i,]/7);
}
}
df2<-data.frame(seq(as.Date("2011-05-02"), as.Date("2011-05-15"), by="days"),df1);
colnames(df2)<-c("cal_day","cost a","cosy b","cost c")
If you have loaded your dataset, you should use:
colnames(df2)<-colnames(df)
Here is the result:
df2;
cal_day cost a cost b cost c
1 2015-05-02 0.7142857 0.1428571 0.5714286
2 2015-05-03 0.7142857 0.1428571 0.5714286
3 2015-05-04 0.7142857 0.1428571 0.5714286
4 2015-05-05 0.7142857 0.1428571 0.5714286
5 2015-05-06 0.7142857 0.1428571 0.5714286
6 2015-05-07 0.7142857 0.1428571 0.5714286
7 2015-05-08 0.7142857 0.1428571 0.5714286
8 2015-05-09 0.4285714 0.8571429 0.2857143
9 2015-05-10 0.4285714 0.8571429 0.2857143
10 2015-05-11 0.4285714 0.8571429 0.2857143
11 2015-05-12 0.4285714 0.8571429 0.2857143
12 2015-05-13 0.4285714 0.8571429 0.2857143
13 2015-05-14 0.4285714 0.8571429 0.2857143
14 2015-05-15 0.4285714 0.8571429 0.2857143
EDIT to answer the new error (read the comments)
It is impossible for me to solve your error without your dataset.
However, it appears that you have more weeks than what you expect. In order to have 1666 rows, you would need 238 weeks. The dates you input in seq(...)
do not even give a result divisible by 7. They are therefore wrong. The last date in your original dataset must be later than the "2015-08-31" you put!
If you were to respect the 1666 days (7 times the number of weeks in your original dataset), that would be from "2011-05-02" to "2015-11-22".
length(seq(as.Date("2011-05-02"), as.Date("2015-11-22"), by="days"));
Originally you said you had 52 weeks. That would give 364 rows.
You are making a mistake in the code that we cannot fix because we cannot see the data. Just look at the original dataset and respect the same dates and it will work.
EDIT 2:
You forgot to get rid of the first column. Obviously, if you do not do it calculations on the dates are going to produce NAs.
Use this code instead:
df<-TEST1;
#Get rid of the first column
df1<-df[-1];
#New dataset
df2<-df1[FALSE,];
#For every week
for (i in 1:dim(df)[1]){
#For all seven days
for (j in 1:7){
df2<-rbind(df2,df1[i,]/7);
}
}
df3<-data.frame(seq(as.Date("2011-05-02"), as.Date("2011-05-15"), by="days"),df2);
colnames(df3)<-colnames(df);
head(df3);
Upvotes: 2
Reputation: 83215
An alternative solution using the data.table package:
library(data.table)
df <- setDT(df)[, .(dates = as.Date(date:(date+6), origin="1970-01-01"),
cost_a = cost_a/7,
cost_b = cost_b/7,
cost_c = cost_c/7),
by = date][, date:=NULL]
this gives:
> df
dates cost_a cost_b cost_c
1: 2015-05-02 0.7142857 0.4285714 0.1428571
2: 2015-05-03 0.7142857 0.4285714 0.1428571
3: 2015-05-04 0.7142857 0.4285714 0.1428571
4: 2015-05-05 0.7142857 0.4285714 0.1428571
5: 2015-05-06 0.7142857 0.4285714 0.1428571
6: 2015-05-07 0.7142857 0.4285714 0.1428571
7: 2015-05-08 0.7142857 0.4285714 0.1428571
8: 2015-05-09 0.8571429 0.5714286 0.2857143
9: 2015-05-10 0.8571429 0.5714286 0.2857143
10: 2015-05-11 0.8571429 0.5714286 0.2857143
11: 2015-05-12 0.8571429 0.5714286 0.2857143
12: 2015-05-13 0.8571429 0.5714286 0.2857143
13: 2015-05-14 0.8571429 0.5714286 0.2857143
14: 2015-05-15 0.8571429 0.5714286 0.2857143
Explanation:
With setDT(df)
you convert the dataframe to a datatable (which is an enhanced dataframe). With by = date
you group by date. Supposing each date in the original dataframe is the starting date of a week, you create the complete week with as.Date(date:(date+6), origin="1970-01-01")
.
Used data:
df <- data.frame(date = c("2015-05-02", "2015-05-09"),
cost_a = c(5, 6),
cost_b = c(3, 4),
cost_c = c(1, 2))
df$date <- as.Date(df$date)
Upvotes: 3
Reputation: 482
Here's one solution you could try. Note that it might not work for others with similar problems because it assumes no dates are skipped.
1. Your data.
To demonstrate, I'll recreate a section of your data in R.
cost_a <- c(5, 6)
cost_b <- c(3, 4)
date <- c("2015-05-02", "2015-05-09")
df <- data.frame(date, cost_a, cost_b)
2. Duplicating rows.
You can use the rep function to replicate rows.
df.expanded <- df[rep(row.names(df), 7), 1:3]
The number 7 indicates how many duplicates you want (i.e. duplicate it 6 times). The 1:3 means the first to third columns. If you have 10 columns of costs, then it would be 1:11.
3. Ordering the rows
Using the default rep command does not order the rows the way you want, so you can quickly order them by date
df.sorted <- df.expanded[order(df.expanded$date),]
4. Get the new values
You can divide everything by 7 like this:
df.divided <- df.sorted/7
5. Making the new dates.
You can use the seq command to automatically create a list of dates.
date <- seq(as.Date("2015-05-02"), as.Date("2015-05-15"), by="days")
This assumes no days are skipped. Enter the first and last dates in the places where I've put the dates. You can also choose to make the list by other divisions besides days, if necessary.
6. Merge the new dates with the divided values;
df.divided$date <- date
Now use:
df.divided
To see if you get what you want
Upvotes: 4