Reputation: 259
I have next question. Its rather simple but I cant figure out where I do wrong. I have one year of observations, but the problem is that in the end of the year after 31.12.16 i have shift to 01.01.17 and then repetitions from 2016 year. This can be clearly seen if you look at lines 8572-8574. Therefore, I want to remove those values(everything after 2017 with 2017). My overall database consists of many years to analyse. Therefore manual removing is not the option.
dato horiginal hour mday mon year wday wk
8569 31.12.2016 64 20 31 11 116 6 53
8570 31.12.2016 70 21 31 11 116 6 53
8571 31.12.2016 71 22 31 11 116 6 53
8572 31.12.2016 71 23 31 11 116 6 53
8573 01.01.2017 78 0 1 0 117 0 53
8574 30.10.2016 46 12 30 9 116 0 44
8575 30.10.2016 38 13 30 9 116 0 44
8576 30.10.2016 35 14 30 9 116 0 44
8577 30.10.2016 36 15 30 9 116 0 44
8578 30.10.2016 46 12 30 9 116 0 44
8579 30.10.2016 38 13 30 9 116 0 44
'data.frame': 8629 obs. of 8 variables:
$ dato : chr "01.01.2016" "01.01.2016" "01.01.2016" "01.01.2016" ...
$ horiginal: num 76 79 78 74 75 71 74 72 71 77 ...
$ hour : int 1 2 3 4 5 6 7 8 9 10 ...
$ mday : int 1 1 1 1 1 1 1 1 1 1 ...
$ mon : int 0 0 0 0 0 0 0 0 0 0 ...
$ year : int 116 116 116 116 116 116 116 116 116 116 ...
$ wday : int 5 5 5 5 5 5 5 5 5 5 ...
$ wk : num 1 1 1 1 1 1 1 1 1 1 ...
The last column "wk" is meant week and it the only increasing column in this dataframe (started from 1). My thought process is to make a loop and remove everything if observation is lower than previous values.
I know that looping here probably is not very efficient way, but nothing comes to my mind so far... so, here is my code:
for (i in 1:NROW(newdf_heat$wk)) {
total.coefs = data.frame()
if (newdf_heat$wk[i+1]< newdf_heat$wk[i]) {
total.coefs = newdf_heat[-c(i:nrow(newdf_heat)),]}
}
Im doing something wrong because nothing happens. I would appreciate any help.
Upvotes: 1
Views: 271
Reputation: 7724
Has you have seveal year, you could the following:
# Extract the cumulative maximum year
my_df$max_year <- cummax(my_df$year)
# Filter those rows which are at least as big as the max_year
my_df[my_df$year >= my_df$max_year, ]
# dato horiginal hour mday mon year wday wk max_year
# 8569 31.12.2016 64 20 31 11 116 6 53 116
# 8570 31.12.2016 70 21 31 11 116 6 53 116
# 8571 31.12.2016 71 22 31 11 116 6 53 116
# 8572 31.12.2016 71 23 31 11 116 6 53 116
# 8573 01.01.2017 78 0 1 0 117 0 53 117
If wk is always increasing you could also use this instead of year, or extract the year as Ronak Shah did it in his answer.
Data
my_df <-
structure(list(dato = c("31.12.2016", "31.12.2016", "31.12.2016", "31.12.2016", "01.01.2017", "30.10.2016", "30.10.2016", "30.10.2016", "30.10.2016", "30.10.2016", "30.10.2016"),
horiginal = c(64L, 70L, 71L, 71L, 78L, 46L, 38L, 35L, 36L, 46L, 38L),
hour = c(20L, 21L, 22L, 23L, 0L, 12L, 13L, 14L, 15L, 12L, 13L),
mday = c(31L, 31L, 31L, 31L, 1L, 30L, 30L, 30L, 30L, 30L, 30L),
mon = c(11L, 11L, 11L, 11L, 0L, 9L, 9L, 9L, 9L, 9L, 9L),
year = c(116L, 116L, 116L, 116L, 117L, 116L, 116L, 116L, 116L, 116L, 116L),
wday = c(6L, 6L, 6L, 6L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
wk = c(53L, 53L, 53L, 53L, 53L, 44L, 44L, 44L, 44L, 44L, 44L)),
class = "data.frame", row.names = c("8569", "8570", "8571", "8572", "8573", "8574", "8575", "8576", "8577", "8578", "8579"))
Upvotes: 3
Reputation: 388817
We can convert the dato
column to Date, extract the year, find out the index of row where "2017" occurs for the first time and remove rows from that index.
df[1:which.max(format(as.Date(df$dato, "%d.%m.%Y"), "%Y") == "2017") - 1,]
# dato horiginal hour mday mon year wday wk
#8569 31.12.2016 64 20 31 11 116 6 53
#8570 31.12.2016 70 21 31 11 116 6 53
#8571 31.12.2016 71 22 31 11 116 6 53
#8572 31.12.2016 71 23 31 11 116 6 53
Or if the year to remove is not "2017" always and the dates are always increasing we can use diff
to find out the first index where there is drop in the date and remove everything after that.
df[1:which.max(diff(as.Date(df$dato, "%d.%m.%Y")) < 0) - 1, ]
Also as mentioned in the post wk
column is always increasing hence, the above diff
logic can be applied for wk
column as well which would save us from converting dato
to Date
df[1:which.max(diff(df$wk) < 0) - 1,]
Upvotes: 2