dtg37
dtg37

Reputation: 13

R apply formula and conditional logic to dataframe

I have a dataframe with numeric variables of the wet weight and dry weight of samples, say soil. In this dataframe some values are equal to 0 and other are greater than zero. I want to apply a formula to the variables to create a new variable, but only for the pairs of data that are greater than zero. So far, I have tried the filter function of dplyr.

I want to create the new variable using the following formula:

moisture content = (wet weight - dry weight)/wet weight

Here is the code I have tried thus far:

dry_weight <- c(0,1,0,2,0,3,4,5,6,7)
wet_weight <- c(1,0,2,4,0,1,4,0,5,0)
weights <- data.frame(dry_weight, wet_weight)
weights$moisture <- weights %>%
  filter(weights$wet_weight > 0, weights$dry_weight >0) %>%
  mutate((weights$wet_weight-weights$dry_weight)/weights$wet_weight)

I am not sure if mutate is the right approach, but when I execute the code i get:

"Error: Column `(weights$wet_weight - weights$dry_weight)/weights$wet_weight` must
 be length 4 (the number of rows) or one, not 10"

Any suggestions would be appreciated.

Upvotes: 1

Views: 302

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 388787

A vectorized way :

#Initialize column to NA
weights$moisture <- NA
#Get the index where dry_weight > 0 and wet_weight > 0
inds <- with(weights, dry_weight > 0 & wet_weight >0)
#Calculate using the formula and replace the value.
weights$moisture[inds] <- with(weights, 
                          (wet_weight[inds] - dry_weight[inds])/wet_weight[inds])


weights
#   dry_weight wet_weight moisture
#1           0          1       NA
#2           1          0       NA
#3           0          2       NA
#4           2          4      0.5
#5           0          0       NA
#6           3          1     -2.0
#7           4          4      0.0
#8           5          0       NA
#9           6          5     -0.2
#10          7          0       NA

Upvotes: 0

coffeinjunky
coffeinjunky

Reputation: 11514

Another approach would be to simply use base R:

weights$moisture <- 
              ifelse(weights$dry_weight*weights$wet_weight > 0
                     , 1-weights$dry_weight/weights$wet_weight
                     , NA)
weights
   dry_weight wet_weight moisture
1           0          1       NA
2           1          0       NA
3           0          2       NA
4           2          4      0.5
5           0          0       NA
6           3          1     -2.0
7           4          4      0.0
8           5          0       NA
9           6          5     -0.2
10          7          0       NA

ifelse is a vectorised if with ifelse(condition, if true then this, if false then that). Here, I check if both values are strictly greater than zero, in which case I return the moisture, or else I return NA.

Upvotes: 1

Ian Campbell
Ian Campbell

Reputation: 24770

I hope this will get you started.

First, no need to keep typing weights$ every time when you're using pipes (%>%).

Second, with mutate, you need to have a left hand side that is assigned with =.

weights %>%
  dplyr::filter(wet_weight > 0 & dry_weight > 0) %>%
  mutate(moisture = (wet_weight - dry_weight)/wet_weight)
#  dry_weight wet_weight moisture
#1          2          4      0.5
#2          3          1     -2.0
#3          4          4      0.0
#4          6          5     -0.2

Remember, if you want to assign this back to weights, just add weights <- to the beginning of the first line.

Upvotes: 1

Related Questions