E.O.
E.O.

Reputation: 351

How do I apply my function to a dataframe such that it creates a new column on the dataframe?

First, apologies for some awful, illogical, clunky code coming up. I have MINIMAL experience with for loops and functions.

In essence, I want to apply a function to a dataframe. This function provides a value [i] conditional on the values in two of the columns in the dataframe. I then want this value to be populated in a new column, and to align with the row containing the values that generated it.

This is using some already generated model values to create predicted abundance of an animal species.

I have created a fairly awful function, aligning with the known values of the generated model.

Here is an example of the data:

structure(list(X = 2:6, x = c(23.69772329, 23.33799932, 24.50995071, 
22.37691419, 31.29742091), y = c(-18.75309389, -18.28537894, 
-19.39926585, -19.23678464, -5.251863724), EVAP_Value = c(502L, 
541L, 750L, 476L, 571L), HFI_Value = c(1, 1, 3.059409052, 2.250018061, 
7), TERMAC_Value = c(605L, 605L, 118L, 605L, 236L), TERMAC_ShortName = 
structure(c(4L, 
4L, 1L, 4L, 2L), .Label = c("DAWS2", "EASM", "Marsh", "PV"), class = 
"factor"), 
GLOBCOV_Value = c(30L, 30L, 30L, 140L, 130L), Glob_ShortName = 
structure(c(5L, 
5L, 5L, 1L, 4L), .Label = c("Grass", "OpBdFrst", "OpNdFrst", 
"Shrub", "VegCrop"), class = "factor"), Unknown_Value = c(527L, 
546L, 488L, 430L, 1020L), Location = structure(c(1L, 1L, 
1L, 1L, 2L), .Label = c("BWA", "TZA"), class = "factor"), 
NDVI_mean = c(0.26736562, 0.28850313, 0.328852412, 0.271927773, 
0.364711006), Random_Category = structure(c(2L, 2L, 2L, 2L, 
1L), .Label = c("Random_Maasai", "Random_Southern"), class = "factor"), 
num = c(1L, 1L, 1L, 1L, 1L), ID = structure(c(1L, 1L, 1L, 
1L, 1L), .Label = "Random", class = "factor")), row.names = 2:6, class = 
"data.frame")

For reference, it looks like this:

X        x          y EVAP_Value HFI_Value TERMAC_Value
1 1 37.97434  -8.833364       1390  6.000000          601
2 2 23.69772 -18.753094        502  1.000000          605
3 3 23.33800 -18.285379        541  1.000000          605
4 4 24.50995 -19.399266        750  3.059409          118
5 5 22.37691 -19.236785        476  2.250018          605
6 6 31.29742  -5.251864        571  7.000000          236
        TERMAC_ShortName GLOBCOV_Value Glob_ShortName Unknown_Value
1             <NA>            90       OpNdFrst          1038
2               PV            30        VegCrop           527
3               PV            30        VegCrop           546
4            DAWS2            30        VegCrop           488
5               PV           140          Grass           430
6             EASM           130          Shrub          1020
  Location NDVI_mean Random_Category num     ID
1      TZA 0.5356669   Random_Maasai   1 Random
2      BWA 0.2673656 Random_Southern   1 Random
3      BWA 0.2885031 Random_Southern   1 Random
4      BWA 0.3288524 Random_Southern   1 Random
5      BWA 0.2719278 Random_Southern   1 Random
6      TZA 0.3647110   Random_Maasai   1 Random

The two columns of interest are the TERMAC_ShortName column and the Glob_ShortName column. My efforts so far are:

 predict.bayes.animal <- function(data){
         if (data$TERMAC_ShortName[i] == "PV") {
           bayes_value[i] <- i - 0.772
  }
         if (data$TERMAC_ShortName[i] == "DAWS2") {
            bayes_value[i] <- i - 1.24
  }
         if (data$TERMAC_ShortName[i] == "EASM") {
            bayes_value[i] <- i - 0.362
  }
         if (data$Glob_ShortName[i] == "VegCrop") {
            bayes_value[i] <- i - 0.3497
 }
         if (data$Glob_ShortName[i] == "Grass") {
            bayes_value[i] <- i - 0.5978
  }
         if (data$Glob_ShortName[i] == "Shrub") {
            bayes_value[i] <- i - 0.2285
  }
         if (data$TERMAC_ShortName[i] == "PV" | data$Glob_ShortName[i] == 
         "VegCrop") {
            bayes_value[i] <- i - 0.56
  }
         if (data$TERMAC_ShortName[i] == "DAWS2" | data$Glob_ShortName[i] == 
         "VegCrop") 
 {
            bayes_value[i] <- i + 0.43
  }
         if (data$TERMAC_ShortName[i] == "PV" | data$Glob_ShortName[i] == 
         "Grass") {
            bayes_value[i] <- i - 0.49
  }
         if (data$TERMAC_ShortName[i] == "EASM" | data$Glob_ShortName[i] == 
         "Shrub") {
            bayes_value[i] <- i - 0.045
  }
   bayes_value
  }

   data["bayes_value"] <- NA
   for (i in 1:nrow(data)) { 
      n <- predict.bayes.animal(data)
      data$bayes_value[i] <- n
  }

Expected result is:

X        x          y EVAP_Value HFI_Value TERMAC_Value
1 1 23.69772 -18.753094        502  1.000000          605
2 2 23.33800 -18.285379        541  1.000000          605
3 3 24.50995 -19.399266        750  3.059409          118
4 4 22.37691 -19.236785        476  2.250018          605
5 5 31.29742  -5.251864        571  7.000000          236
        TERMAC_ShortName GLOBCOV_Value Glob_ShortName Unknown_Value
1               PV            30        VegCrop           527
2               PV            30        VegCrop           546
3            DAWS2            30        VegCrop           488
4               PV           140          Grass           430
5             EASM           130          Shrub          1020
  Location NDVI_mean Random_Category num     ID   bayes_value
1      BWA 0.2673656 Random_Southern   1 Random       -1.68
2      BWA 0.2885031 Random_Southern   1 Random       -1.68
3      BWA 0.3288524 Random_Southern   1 Random       -1.20
4      BWA 0.2719278 Random_Southern   1 Random       -1.86
5      TZA 0.3647110   Random_Maasai   1 Random       -0.64

The actual result so far is "Error in predict.bayes.animal(data) : object 'bayes_value' not found"

Thank you in advance for any assistance.

Upvotes: 0

Views: 42

Answers (1)

Sarah
Sarah

Reputation: 3519

As discussed in the comments, there is a bit of confusion about exactly what you are trying to do, but would using dplyr's mutate (to add new column) and case_when (instead of multiple if statements) possibly simplify things? Eg:

library(dplyr)
data %>% mutate(bayes_value = 
                  case_when(TERMAC_ShortName == "PV" ~ -0.772,
                            data$TERMAC_ShortName == "DAWS2"~-1.24,
                            <OTHER CASES HERE>))

REVISED:

  data %>% mutate(bayes_value = 
                      case_when(TERMAC_ShortName == "PV" ~ -0.772,
                                TERMAC_ShortName == "DAWS2"~-1.24,
                                <OTHER TERMAC_ShortName CASES HERE>
                                T~0)+
                      case_when(Glob_ShortName == "Grass"~-0.5978,
                                <OTHER Glob CASES HERE>
                                T~0)+
                      case_when(TERMAC_ShortName == "PV" | Glob_ShortName== "VegCrop"~-0.56,
                                <OTHER Combined CASES HERE>
                                T~0))

Upvotes: 1

Related Questions