Reputation: 397
i have the following data frame:
year month day fivemin rrp_nsw rrp_qld rrp_sa rrp_tas rrp_vic
2009 7 1 1 16.9 17.6 16.7 15.7 15.5
2009 7 1 2 17.7 18.8 17.8 -16.1 15.5
2009 7 1 3 -17.7 18.6 18.1 15.9 15.4
2009 7 1 4 16.7 18.6 -17.6 14.3 12.8
2009 7 2 1 -15.6 17.6 16.3 13.2 11.8
2009 7 2 2 13.7 15.7 12.0 -11.1 -12.9
2009 7 2 3 13.7 15.8 11.9 11.1 12.9
2009 7 2 4 -13.9 16.1 -12.1 11.2 12.9
2009 8 1 1 13.8 16.0 12.2 11.2 12.8
2009 8 1 2 -13.7 16.3 11.6 10.6 12.6
2009 8 1 3 13.7 -15.8 11.9 11.0 12.7
2009 8 1 4 13.8 16.0 12.1 11.2 12.9
2009 8 2 1 17.6 -17.6 17.3 16.5 17.1
2009 8 2 2 17.7 17.6 17.3 16.8 17.4
2009 8 2 3 15.8 16.0 15.1 15.0 15.5
2009 8 2 4 -15.4 15.6 14.5 14.6 15.1
2009 9 1 1 14.7 15.0 13.8 14.0 14.5
2009 9 1 2 15.3 15.4 14.3 14.6 15.0
2009 9 1 3 15.3 15.6 14.4 14.5 15.0
2009 9 1 4 14.9 15.7 13.7 13.8 14.5
My goal is create 10 columns which contains the variation (Pi-Pi-1) of rrp_nsw
rrp_qld
rrp_sa
rrp_tas
rrp_vic
. For example, if the variation of the variable is positive then the column rrp_nsw_RS_pos
is created if else rrp_nsw_RS_neg
is created.
Using only one of the variables i want to explain my desired result.
year month day fivemin rrp_nsw rrp_nsw_RV_pos rrp_nsw_RV_neg
2009 7 1 1 16.9 NA NA
2009 7 1 2 17.7 0.8 NA
2009 7 1 3 -17.7 NA 35.4
2009 7 1 4 16.7 34.4 NA
2009 7 2 1 -15.6 NA 32.3
2009 7 2 2 13.7 29.3 NA
2009 7 2 3 13.7 0 NA
2009 7 2 4 -13.9 NA 27.6
2009 8 1 1 13.8 27.7 NA
2009 8 1 2 -13.7 NA 27.5
2009 8 1 3 13.7 27.4 NA
2009 8 1 4 13.8 0.1 NA
2009 8 2 1 17.6 3.8 NA
2009 8 2 2 17.7 0.1 NA
2009 8 2 3 15.8 NA 1.9
2009 8 2 4 -15.4 NA 31.2
2009 9 1 1 14.7 30.1 NA
2009 9 1 2 15.3 0.6 NA
2009 9 1 3 15.3 0 NA
2009 9 1 4 14.9 NA 0.4
The code that i have thought is the next
if(){
Base<-Base %>%
arrange(year, month, day, fivemin) %>%
group_by(year, month, day) %>%
mutate_at(vars(rrp_nsw, rrp_qld, rrp_sa, rrp_tas, rrp_vic), list(RV_pos = ~ c(NA, abs(diff(.))))) %>%
} else {
Base<-Base %>%
mutate_at(vars(rrp_nsw, rrp_qld, rrp_sa, rrp_tas, rrp_vic), list(RV_neg = ~ c(NA, diff(.)))) %>%
ungroup()
}
I believe that I can create the columns with dplyr but i am a new user and for me the easiest was to think about If and If else. I would be gratefull if someones can help me with my code
Upvotes: 1
Views: 63
Reputation: 6921
With dplyr
, you can mutate all columns at once using tidy select helpers like starts_with
and ends_with
in mutate_at
.
You can apply multiple functions at once in a mutate_at
by passing a named list of functions. New columns will be created with the function name appended, which makes it easier to get your desired result.
For brevity, I'll keep only 2 rrp_*
columns but it will work with any number of columns starting with "rrp_"
.
df %>%
select(year, month, day, rrp_nsw, rrp_qld) %>% # Optional: keeping only 2
mutate_at(
vars(starts_with("rrp")), # mutating all variables starting with rrp (you can also enumerate them)
list(RV = ~ c(NA, diff(.))) # storing the differences in columns *_RV
) %>%
mutate_at(
vars(ends_with("RV")), # mutating the columns *_RV just created
list(pos = ~ ifelse(. >= 0, ., NA), # splitting pos and neg
neg = ~ ifelse(. < 0, ., NA))
) %>%
select(- ends_with("_RV")) # removing temp *_RV variable
Result:
year month day rrp_nsw rrp_qld rrp_nsw_RV_pos rrp_qld_RV_pos rrp_nsw_RV_neg rrp_qld_RV_neg
1 2009 7 1 16.9 17.6 NA NA NA NA
2 2009 7 1 17.7 18.8 0.8 1.2 NA NA
3 2009 7 1 -17.7 18.6 NA NA -35.4 -0.2
4 2009 7 1 16.7 18.6 34.4 0.0 NA NA
5 2009 7 2 -15.6 17.6 NA NA -32.3 -1.0
6 2009 7 2 13.7 15.7 29.3 NA NA -1.9
7 2009 7 2 13.7 15.8 0.0 0.1 NA NA
8 2009 7 2 -13.9 16.1 NA 0.3 -27.6 NA
9 2009 8 1 13.8 16.0 27.7 NA NA -0.1
10 2009 8 1 -13.7 16.3 NA 0.3 -27.5 NA
11 2009 8 1 13.7 -15.8 27.4 NA NA -32.1
12 2009 8 1 13.8 16.0 0.1 31.8 NA NA
13 2009 8 2 17.6 -17.6 3.8 NA NA -33.6
14 2009 8 2 17.7 17.6 0.1 35.2 NA NA
15 2009 8 2 15.8 16.0 NA NA -1.9 -1.6
16 2009 8 2 -15.4 15.6 NA NA -31.2 -0.4
17 2009 9 1 14.7 15.0 30.1 NA NA -0.6
18 2009 9 1 15.3 15.4 0.6 0.4 NA NA
19 2009 9 1 15.3 15.6 0.0 0.2 NA NA
20 2009 9 1 14.9 15.7 NA 0.1 -0.4 NA
ALternatively, you would end up with cleaner code using some small helper functions.
#' Take positive differences
pos_diff <- function(vec) {
delta <- c(NA, diff(vec))
ifelse(delta >= 0, delta, NA)
}
#' Take negative differences
neg_diff <- function(vec) {
delta <- c(NA, diff(vec))
ifelse(delta < 0, delta, NA)
}
This makes the code much neater:
df %>%
mutate_at(vars(starts_with("rrp")),
list(RV_pos = pos_diff, RV_neg = neg_diff))
Upvotes: 1
Reputation: 486
As an alternative to the first answer:
variables_to_edit <- c("rrp_nsw","rrp_qld","rrp_sa","rrp_tas","rrp_vic")
for (vars in variables_to_edit) {
# Create the negative column
df[ncol(df)+1] <- c(NA, ifelse(head(df[,vars], -1) - tail(df[,vars],-1) > 0,
head(df[,vars], -1) - tail(df[,vars],-1), NA))
# Create the positive column
df[ncol(df)+1] <- c(NA, ifelse(head(df[,vars], -1) + tail(df[,vars],-1) > 0,
head(df[,vars], -1) + tail(df[,vars],-1), NA))
# Update the names of the new columns
names(df)[c(ncol(df)-1, ncol(df))] <- paste0(vars, c("_RV_pos","_RV_neg"))
}
Upvotes: 0
Reputation: 72828
Create a temporary vector first and ifelse
after.
tmp <- with(dat, rrp_nsw - c(NA, rrp_nsw[-length(rrp_nsw)]))
dat <- transform(dat,
rrp_nsw_RV_pos=ifelse(dat$rrp_nsw > 0, tmp, NA),
rrp_nsw_RV_neg=ifelse(dat$rrp_nsw < 0, tmp, NA))
dat[-(6:9)]
# year month day fivemin rrp_nsw rrp_nsw_RV_pos rrp_nsw_RV_neg
# 1 2009 7 1 1 16.9 NA NA
# 2 2009 7 1 2 17.7 0.8 NA
# 3 2009 7 1 3 -17.7 NA -35.4
# 4 2009 7 1 4 16.7 34.4 NA
# 5 2009 7 2 1 -15.6 NA -32.3
# 6 2009 7 2 2 13.7 29.3 NA
# 7 2009 7 2 3 13.7 0.0 NA
# 8 2009 7 2 4 -13.9 NA -27.6
# 9 2009 8 1 1 13.8 27.7 NA
# 10 2009 8 1 2 -13.7 NA -27.5
# 11 2009 8 1 3 13.7 27.4 NA
# 12 2009 8 1 4 13.8 0.1 NA
# 13 2009 8 2 1 17.6 3.8 NA
# 14 2009 8 2 2 17.7 0.1 NA
# 15 2009 8 2 3 15.8 -1.9 NA
# 16 2009 8 2 4 -15.4 NA -31.2
# 17 2009 9 1 1 14.7 30.1 NA
# 18 2009 9 1 2 15.3 0.6 NA
# 19 2009 9 1 3 15.3 0.0 NA
# 20 2009 9 1 4 14.9 -0.4 NA
dat <- structure(list(year = c(2009L, 2009L, 2009L, 2009L, 2009L, 2009L,
2009L, 2009L, 2009L, 2009L, 2009L, 2009L, 2009L, 2009L, 2009L,
2009L, 2009L, 2009L, 2009L, 2009L), month = c(7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L
), day = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L), fivemin = c(1L, 2L, 3L, 4L, 1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L),
rrp_nsw = c(16.9, 17.7, -17.7, 16.7, -15.6, 13.7, 13.7, -13.9,
13.8, -13.7, 13.7, 13.8, 17.6, 17.7, 15.8, -15.4, 14.7, 15.3,
15.3, 14.9), rrp_qld = c(17.6, 18.8, 18.6, 18.6, 17.6, 15.7,
15.8, 16.1, 16, 16.3, -15.8, 16, -17.6, 17.6, 16, 15.6, 15,
15.4, 15.6, 15.7), rrp_sa = c(16.7, 17.8, 18.1, -17.6, 16.3,
12, 11.9, -12.1, 12.2, 11.6, 11.9, 12.1, 17.3, 17.3, 15.1,
14.5, 13.8, 14.3, 14.4, 13.7), rrp_tas = c(15.7, -16.1, 15.9,
14.3, 13.2, -11.1, 11.1, 11.2, 11.2, 10.6, 11, 11.2, 16.5,
16.8, 15, 14.6, 14, 14.6, 14.5, 13.8), rrp_vic = c(15.5,
15.5, 15.4, 12.8, 11.8, -12.9, 12.9, 12.9, 12.8, 12.6, 12.7,
12.9, 17.1, 17.4, 15.5, 15.1, 14.5, 15, 15, 14.5)), row.names = c(NA,
-20L), class = "data.frame")
Upvotes: 0