EleMan
EleMan

Reputation: 43

Compute new variables and use those variables to compute other variables

With the following data in two columns, I am trying to compute new variables, which I then want to use to compute other variables. The two variables I have are "Area" and "Observed"

Area         Observed
3353        31
2297        2
1590        15
1087        16
817         2
847         10
1014        28
872         29
1026        29
1215        21

I need to compute a new variable called "Relative area" by summing "area" and dividing each "area" value by the total. For Example 3353/14118 = 0.237

I then need to compute a new variable called "Expected" by summing "Observed" column and multiplying each value by the newly computed "RelativeArea".

The error I get is: Column "Expected" must be length 10 (the number of rows) or one, not 0

The next column needs to be computed as "O-E" which is the "Observed" column minus the "Expected" (newly computed column). Of course, I can't get to this because of the above error.

I have been able to generate the first new variable "RelativeArea", but cannot create the next one "Expected"

The code I used:

library(tidyverse)
data <- read.csv("data1.csv")
data %>% mutate(RelativeArea = data$Area/sum(data$Area)) ##this works
data %>% mutate(Expected = data$RelativeArea*sum(data$Observed)) ##this DOES NOT WORK and gives me the error: Column "Expected" must be length 10 (the number of rows) or one, not 0 

I would expect that the "Expected" column uses the values from the "RelativeArea" column and multiplies each value with the SUM of the "Observed" values to compute the "Expected' value....

Upvotes: 0

Views: 82

Answers (1)

Dij
Dij

Reputation: 1378

Remove the data$, and create each variable in mutate, separated with a comma:

data <- read.csv("data1.csv")

data %>% mutate(RelativeArea = Area/sum(Area), 
    Expected = RelativeArea*sum(Observed)) #close the mutate

Explanation

You don't need to call data$ repeatedly, that is why you are not getting the result you want. When you call data$ it is mutating from the data, as it appeared in the original data frame that you sent to the first mutate. But after creating the first variable, you have created a new data frame implicitly, which has different variables. Separate the next variable mutation by a comma, and make your next variable in the same call.

Upvotes: 2

Related Questions