Eve Chanatasig
Eve Chanatasig

Reputation: 397

Using R, How can I create new columns using if?

i have the following data frame:

year month   day fivemin rrp_nsw rrp_qld rrp_sa rrp_tas rrp_vic
2009     7     1       1    16.9    17.6   16.7    15.7    15.5
2009     7     1       2    17.7    18.8   17.8    -16.1    15.5
2009     7     1       3    -17.7    18.6   18.1    15.9    15.4
2009     7     1       4    16.7    18.6   -17.6    14.3    12.8
2009     7     2       1    -15.6    17.6   16.3    13.2    11.8
2009     7     2       2    13.7    15.7   12.0    -11.1    -12.9
2009     7     2       3    13.7    15.8   11.9    11.1    12.9
2009     7     2       4    -13.9    16.1   -12.1    11.2    12.9
2009     8     1       1    13.8    16.0   12.2    11.2    12.8
2009     8     1       2    -13.7    16.3   11.6    10.6    12.6
2009     8     1       3    13.7    -15.8   11.9    11.0    12.7
2009     8     1       4    13.8    16.0   12.1    11.2    12.9
2009     8     2       1    17.6    -17.6   17.3    16.5    17.1
2009     8     2       2    17.7    17.6   17.3    16.8    17.4
2009     8     2       3    15.8    16.0   15.1    15.0    15.5
2009     8     2       4    -15.4    15.6   14.5    14.6    15.1
2009     9     1       1    14.7    15.0   13.8    14.0    14.5
2009     9     1       2    15.3    15.4   14.3    14.6    15.0
2009     9     1       3    15.3    15.6   14.4    14.5    15.0
2009     9     1       4    14.9    15.7   13.7    13.8    14.5

My goal is create 10 columns which contains the variation (Pi-Pi-1) of rrp_nsw rrp_qld rrp_sa rrp_tas rrp_vic. For example, if the variation of the variable is positive then the column rrp_nsw_RS_pos is created if else rrp_nsw_RS_neg is created.

Using only one of the variables i want to explain my desired result.

year month   day fivemin rrp_nsw rrp_nsw_RV_pos rrp_nsw_RV_neg 
2009     7     1       1    16.9        NA             NA  
2009     7     1       2    17.7        0.8            NA
2009     7     1       3    -17.7       NA           35.4
2009     7     1       4    16.7       34.4            NA
2009     7     2       1    -15.6       NA             32.3 
2009     7     2       2    13.7       29.3            NA
2009     7     2       3    13.7         0             NA
2009     7     2       4    -13.9        NA           27.6
2009     8     1       1    13.8       27.7            NA
2009     8     1       2    -13.7        NA             27.5
2009     8     1       3    13.7        27.4           NA
2009     8     1       4    13.8        0.1            NA 
2009     8     2       1    17.6        3.8            NA
2009     8     2       2    17.7        0.1            NA
2009     8     2       3    15.8        NA             1.9
2009     8     2       4    -15.4       NA            31.2  
2009     9     1       1    14.7        30.1           NA
2009     9     1       2    15.3        0.6            NA
2009     9     1       3    15.3         0             NA
2009     9     1       4    14.9         NA            0.4

The code that i have thought is the next

if(){
  Base<-Base %>%
    arrange(year, month, day, fivemin) %>%
    group_by(year, month, day) %>%
    mutate_at(vars(rrp_nsw, rrp_qld, rrp_sa, rrp_tas, rrp_vic), list(RV_pos = ~ c(NA, abs(diff(.))))) %>%
    } else {
  Base<-Base %>%
    mutate_at(vars(rrp_nsw, rrp_qld, rrp_sa, rrp_tas, rrp_vic), list(RV_neg = ~ c(NA, diff(.)))) %>%
    ungroup()
}

I believe that I can create the columns with dplyr but i am a new user and for me the easiest was to think about If and If else. I would be gratefull if someones can help me with my code

Upvotes: 1

Views: 63

Answers (3)

asachet
asachet

Reputation: 6921

With dplyr, you can mutate all columns at once using tidy select helpers like starts_with and ends_with in mutate_at.

You can apply multiple functions at once in a mutate_at by passing a named list of functions. New columns will be created with the function name appended, which makes it easier to get your desired result.

For brevity, I'll keep only 2 rrp_* columns but it will work with any number of columns starting with "rrp_".


df %>%
  select(year, month, day, rrp_nsw, rrp_qld) %>% # Optional: keeping only 2
  mutate_at(
    vars(starts_with("rrp")), # mutating all variables starting with rrp (you can also enumerate them)
    list(RV = ~ c(NA, diff(.))) # storing the differences in columns *_RV
  ) %>% 
  mutate_at(
    vars(ends_with("RV")), # mutating the columns *_RV just created 
    list(pos = ~ ifelse(. >= 0, ., NA), # splitting pos and neg 
         neg = ~ ifelse(. <  0, ., NA)) 
  ) %>%
  select(- ends_with("_RV")) # removing temp *_RV variable

Result:

   year month day rrp_nsw rrp_qld rrp_nsw_RV_pos rrp_qld_RV_pos rrp_nsw_RV_neg rrp_qld_RV_neg
1  2009     7   1    16.9    17.6             NA             NA             NA             NA
2  2009     7   1    17.7    18.8            0.8            1.2             NA             NA
3  2009     7   1   -17.7    18.6             NA             NA          -35.4           -0.2
4  2009     7   1    16.7    18.6           34.4            0.0             NA             NA
5  2009     7   2   -15.6    17.6             NA             NA          -32.3           -1.0
6  2009     7   2    13.7    15.7           29.3             NA             NA           -1.9
7  2009     7   2    13.7    15.8            0.0            0.1             NA             NA
8  2009     7   2   -13.9    16.1             NA            0.3          -27.6             NA
9  2009     8   1    13.8    16.0           27.7             NA             NA           -0.1
10 2009     8   1   -13.7    16.3             NA            0.3          -27.5             NA
11 2009     8   1    13.7   -15.8           27.4             NA             NA          -32.1
12 2009     8   1    13.8    16.0            0.1           31.8             NA             NA
13 2009     8   2    17.6   -17.6            3.8             NA             NA          -33.6
14 2009     8   2    17.7    17.6            0.1           35.2             NA             NA
15 2009     8   2    15.8    16.0             NA             NA           -1.9           -1.6
16 2009     8   2   -15.4    15.6             NA             NA          -31.2           -0.4
17 2009     9   1    14.7    15.0           30.1             NA             NA           -0.6
18 2009     9   1    15.3    15.4            0.6            0.4             NA             NA
19 2009     9   1    15.3    15.6            0.0            0.2             NA             NA
20 2009     9   1    14.9    15.7             NA            0.1           -0.4             NA

ALternatively, you would end up with cleaner code using some small helper functions.

#' Take positive differences
pos_diff <- function(vec) {
  delta <- c(NA, diff(vec))
  ifelse(delta >= 0, delta, NA)
}

#' Take negative differences
neg_diff <- function(vec) {
  delta <- c(NA, diff(vec))
  ifelse(delta < 0, delta, NA)
}

This makes the code much neater:

df %>%
  mutate_at(vars(starts_with("rrp")), 
            list(RV_pos = pos_diff, RV_neg = neg_diff))

Upvotes: 1

JaiPizGon
JaiPizGon

Reputation: 486

As an alternative to the first answer:

variables_to_edit <- c("rrp_nsw","rrp_qld","rrp_sa","rrp_tas","rrp_vic")
for (vars in variables_to_edit) {
  # Create the negative column
  df[ncol(df)+1] <- c(NA, ifelse(head(df[,vars], -1) - tail(df[,vars],-1) > 0,
                           head(df[,vars], -1) - tail(df[,vars],-1), NA))
  # Create the positive column
  df[ncol(df)+1] <- c(NA, ifelse(head(df[,vars], -1) + tail(df[,vars],-1) > 0,
                          head(df[,vars], -1) + tail(df[,vars],-1), NA))
  # Update the names of the new columns
  names(df)[c(ncol(df)-1, ncol(df))] <- paste0(vars, c("_RV_pos","_RV_neg"))
}

Upvotes: 0

jay.sf
jay.sf

Reputation: 72828

Create a temporary vector first and ifelse after.

tmp <- with(dat, rrp_nsw - c(NA, rrp_nsw[-length(rrp_nsw)]))

dat <- transform(dat, 
                 rrp_nsw_RV_pos=ifelse(dat$rrp_nsw > 0, tmp, NA),
                 rrp_nsw_RV_neg=ifelse(dat$rrp_nsw < 0, tmp, NA))
dat[-(6:9)]
#    year month day fivemin rrp_nsw rrp_nsw_RV_pos rrp_nsw_RV_neg
# 1  2009     7   1       1    16.9             NA             NA
# 2  2009     7   1       2    17.7            0.8             NA
# 3  2009     7   1       3   -17.7             NA          -35.4
# 4  2009     7   1       4    16.7           34.4             NA
# 5  2009     7   2       1   -15.6             NA          -32.3
# 6  2009     7   2       2    13.7           29.3             NA
# 7  2009     7   2       3    13.7            0.0             NA
# 8  2009     7   2       4   -13.9             NA          -27.6
# 9  2009     8   1       1    13.8           27.7             NA
# 10 2009     8   1       2   -13.7             NA          -27.5
# 11 2009     8   1       3    13.7           27.4             NA
# 12 2009     8   1       4    13.8            0.1             NA
# 13 2009     8   2       1    17.6            3.8             NA
# 14 2009     8   2       2    17.7            0.1             NA
# 15 2009     8   2       3    15.8           -1.9             NA
# 16 2009     8   2       4   -15.4             NA          -31.2
# 17 2009     9   1       1    14.7           30.1             NA
# 18 2009     9   1       2    15.3            0.6             NA
# 19 2009     9   1       3    15.3            0.0             NA
# 20 2009     9   1       4    14.9           -0.4             NA

Data

dat <- structure(list(year = c(2009L, 2009L, 2009L, 2009L, 2009L, 2009L, 
2009L, 2009L, 2009L, 2009L, 2009L, 2009L, 2009L, 2009L, 2009L, 
2009L, 2009L, 2009L, 2009L, 2009L), month = c(7L, 7L, 7L, 7L, 
7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L
), day = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L), fivemin = c(1L, 2L, 3L, 4L, 1L, 
2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), 
    rrp_nsw = c(16.9, 17.7, -17.7, 16.7, -15.6, 13.7, 13.7, -13.9, 
    13.8, -13.7, 13.7, 13.8, 17.6, 17.7, 15.8, -15.4, 14.7, 15.3, 
    15.3, 14.9), rrp_qld = c(17.6, 18.8, 18.6, 18.6, 17.6, 15.7, 
    15.8, 16.1, 16, 16.3, -15.8, 16, -17.6, 17.6, 16, 15.6, 15, 
    15.4, 15.6, 15.7), rrp_sa = c(16.7, 17.8, 18.1, -17.6, 16.3, 
    12, 11.9, -12.1, 12.2, 11.6, 11.9, 12.1, 17.3, 17.3, 15.1, 
    14.5, 13.8, 14.3, 14.4, 13.7), rrp_tas = c(15.7, -16.1, 15.9, 
    14.3, 13.2, -11.1, 11.1, 11.2, 11.2, 10.6, 11, 11.2, 16.5, 
    16.8, 15, 14.6, 14, 14.6, 14.5, 13.8), rrp_vic = c(15.5, 
    15.5, 15.4, 12.8, 11.8, -12.9, 12.9, 12.9, 12.8, 12.6, 12.7, 
    12.9, 17.1, 17.4, 15.5, 15.1, 14.5, 15, 15, 14.5)), row.names = c(NA, 
-20L), class = "data.frame")

Upvotes: 0

Related Questions