Reputation: 1296
I am working with a hourly dataset for 1 year and trying to calculate sum of weekday (Mon - Fri) and weekend (sat, sun) values and add to a new row at the bottom of the data set. my data frame looks like this:
I have created the sum for all week with the following command:
## calculate column sum
df[366,(3:27)] <- colSums(df[,3:27], na.rm = T)
Then calculated the number of missing values in each column with this:
## calculate number of missing values in a column
NA.find <- function(x) length(which(is.na(x))) #function for finding missing values
myNumCols <- which(unlist(lapply(df, is.numeric)))
df[(nrow(df) + 1), myNumCols] <- sapply(df[, myNumCols],NA.find )
Now I am trying to calculate the weekday and weekend sum between (Column v1:total) and also number of missing values and add the the bottom rows like I did for the all days.
I know that I need to somehow define my date so that R can read it and then use some how to subset weekday and weekend, I figure out how to subset individual like this:
e = colsplit(df1$date,split=" ",names=c("day1","day2","month"))
df2 = cbind(df1[,c("type","date")],day=e[1],cdate=e[2],month=e[3],df1[,3:ncol(df1)])
df3 = subset(df2, day1 == "Mon")
But I am not sure how can subset it say Mon:Friday and again Saturday:Sunday?
What I am trying to achieve is to create 4 new rows from (368:371) and add weekday total, weekday number of missing values, weekend total, weekend number of missing values.
Many thanks, Ayan
Upvotes: 1
Views: 2672
Reputation: 60492
A few comments.
Don't add column sums to your data frame. For example, in your code, you have:
df[366,(3:27)] <- colSums(df[,3:27], na.rm = T)
Suppose we subset your data frame, to every weekend. You column sums are wrong. Store them as a separate vector:
year_sums =colSums(df[,3:27], na.rm = TRUE)
Since you no longer have column sums in your data frame, it's easier to query. So to determine the number of missing values, do something like:
colSums(is.na(dd[,3:27])
R treats TRUE as 1 and FALSE as 0.
Now to answer your question. The clever way is to use R's date object or use the lubridate package. However, I would suggest you start with something a bit easier. Just create a new column and subset. In your example, the first day of the week is Friday. So
##M for mid week
##W for week end
df$type = c("M", "W", "W", "M", "M", "M", "M")
creates a column identifying if the data is weekend or mid-week. So to subset, just:
df[df$type == "M",]
Upvotes: 2