Reputation: 135

R: calculating new variables to a dataset by applying different formulas

I have a dataframe "data" with the following structure:

structure(list(age = c(45, 4, 32, 45), sex = c(1, 0, 1, 0), height = c(165, 
178, 145, 132), weight = c(65, 73, 60, 45)), row.names = c(NA, 
-4L), class = c("tbl_df", "tbl", "data.frame"))

And I would like to add to this data.frame two new variables (var1, var2), which should be calculated with the two following formulas:

var1 = age*height + (4 if sex==1 OR 2 if sex==0)

var2 = height*weight + (1 if age>40 or 2 if age=<40)

I have a problem both in adding the two variables to the data frame, both in applying a function (I tried to build a function, but seems that can be applied only to a single value and not to all values from all rows).

Can anyone help me, please?

Upvotes: 0

Answers (4)

IRTFM

Reputation: 263481

akrun's suggestion of using Boolean arithmetic is a good one but you could also do simply a Boolean version of your own expression substituting multiplication for the if statements.s (whit mild editing of the "=<" to "<=")

data <- structure(list(age = c(45, 4, 32, 45), sex = c(1, 0, 1, 0), height = c(165, 178, 145, 132), weight = c(65, 73, 60, 45)), row.names = c(NA,                                                                                                                                           -4L), class = c("tbl_df", "tbl", "data.frame")) 

data <- within(data, {var1 = age*height + 4*(sex==1) + 2 *(sex==0); 
                      var2 = height*weight + (age>40) + 2 *(age <= 40)})
#----
> data
  age sex height weight  var2 var1
1  45   1    165     65 10726 7429
2   4   0    178     73 12996  714
3  32   1    145     60  8702 4644
4  45   0    132     45  5941 5942

Since the two sets of conditions are each disjoint, the "non-qualifying" choice terms will each be 0.

Upvotes: 2

akrun

Reputation: 887881

We convert the logical/binary to numeric index by adding 1 to it and use that to change the values to 2, 4, or just 1, 2 and use that in the calculation

library(dplyr)
data %>% 
     mutate(var1 = (age * height) +  c(2, 4)[sex + 1],
                   var2 = (height * weight) + (age <= 40)+1)
# A tibble: 4 x 6
#    age   sex height weight  var1  var2
#  <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl>
#1    45     1    165     65  7429 10726
#2     4     0    178     73   714 12996
#3    32     1    145     60  4644  8702
#4    45     0    132     45  5942  5941

Upvotes: 2

Ariel

Reputation: 405

I rather the tool case_when() from dplyr package.

Your original data is:

data <- 
structure(
  list(age = c(45, 4, 32, 45),
       sex = c(1, 0, 1, 0),
       height = c(165, 178, 145, 132),
       weight = c(65, 73, 60, 45)),
  row.names = c(NA, -4L),
  class = c("tbl_df", "tbl", "data.frame"))

The new variables are created by:

library(dplyr)

data ->
data %>% mutate(var1 = case_when(sex==1 ~ age*height + 4,
                                 sex==0 ~ age*height + 2),
                var2 = case_when(age>40 ~ height*weight + 1,
                                 age<=40 ~ height*weight + 2)
)

The outcome is:

# A tibble: 4 x 6
    age   sex height weight  var1  var2
  <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl>
1    45     1    165     65  7429 10726
2     4     0    178     73   714 12996
3    32     1    145     60  4644  8702
4    45     0    132     45  5942  5941

Upvotes: 2

John Carty

Reputation: 262

the function ifelse() is vector based, so it will apply the conditions to each element in the vector.

df <- structure(list(age = c(45, 4, 32, 45), sex = c(1, 0, 1, 0), height = c(165, 
178, 145, 132), weight = c(65, 73, 60, 45)), row.names = c(NA, 
-4L), class = c("tbl_df", "tbl", "data.frame"))

df$var1 <- ifelse(df$sex == 1,(df$age * df$height) + 4,(df$age * df$height) + 2)
df$var2 <- ifelse(df$age > 40,(df$weight * df$height) + 1,(df$age * df$height) + 2)

final output

> df
# A tibble: 4 x 6
    age   sex height weight  var1  var2
  <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl>
1    45     1    165     65  7429 10726
2     4     0    178     73   714   714
3    32     1    145     60  4644  4642
4    45     0    132     45  5942  5941

Upvotes: 2

R: calculating new variables to a dataset by applying different formulas

Answers (4)

Related Questions