Silent
Silent

Reputation: 55

unable to perform Iteration with R

I've been practicing with the Titanic dataset and have made steady progress. However, I have got stuck when I try to to replace the missing 'Age' variables with the median age depending upon certain conditions. Here is the dataset:

| Pclass | Name      |   Sex  | Age |
|:------:|:---------:|:------:|:---:|
|    2   |  officer  |  male  |  NA |
|    3   |  mr       |  male  |  27 |
|    3   |  miss     | female |  NA |

Now, I want to replace the NAs with the median ages that i calculated and leave the age already present as it is. For this I used the following code for iteration:

age_fill <- function(x){ 

    for (i in length(x$Age)) {

      if (!is.na(x$Age[i])) {
          return(x$Age[i])

      }

      else if(is.na(x$Age[i])){


        if (x$Sex[i] == "female" && x$Pclass[i] == "3" && x$Name[i] == "miss"){
          x$Age[i] = 18
        }

        if (x$Sex[i] == "male" && x$Pclass[i] == "2" && x$Name[i] == "mr"){
          x$Age[i] = 29
        }

        if (x$Sex[i] == "male" && x$Pclass[i] == "3" && x$Name[i] == "officer"){
          x$Age[i] = 25
        }

     }

  }
  return(x)
}

The problem here is nothing changes when i run the code as a function or in a loop. However, if I run it separately by putting in the digits of the row, it returns the results just fine.

Can someone pls tell me what I'm doing wrong?

Upvotes: 0

Views: 44

Answers (2)

Martin Gal
Martin Gal

Reputation: 16988

Your Question

Regarding your function there are several issues:


age_fill <- function(x){ 

    for (i in length(x$Age)) {

      if (!is.na(x$Age[i])) {
          return(x$Age[i])

      }
# some more code
}
  1. Your for-loop just loops over one element: length(x$Age) returns one value. I guess you mistook it for 1:length(x$Age).

  2. If your function encounters a non-NA value, return(x$Age[i]) will break/stop your function and return one value. I don't think that is what you want. In the case of a non-NA value you want your function not to change anything. Therefore you should remove this whole part:

      if (!is.na(x$Age[i])) {
          return(x$Age[i])

      }

      else 

Your condition

if(is.na(x$Age[i])){
# enter code here
}

is sufficent.

Alternative approach

Here is a solution using dplyr. It's not a direct answer to your question but I want to show you another approach to your problem. Given a dataset

> df
# A tibble: 6 x 4
  Pclass Name    Sex      Age
   <dbl> <chr>   <chr>  <dbl>
1      2 officer male      NA
2      3 mr      male      27
3      3 miss    female    NA
4      3 mr      male      NA
5      2 mr      male      NA
6      3 officer male      NA

that I created with package readr

df <- read_table2("Pclass  Name        Sex   Age
    2     officer    male    NA 
    3     mr         male    27 
    3     miss       female   NA 
    3     mr         male     NA
    2     mr         male     NA
    3     officer    male     NA")

Now we use mutate combined with case_when

df %>%
  mutate(Age = case_when(!is.na(Age) ~ Age,
                         Sex == "male"   & Pclass == "3" & Name == "officer" ~ 25,
                         Sex == "male"   & Pclass == "2" & Name == "mr"      ~ 29,
                         Sex == "female" & Pclass == "3" & Name == "miss"    ~ 18
                        ))

which yields

# A tibble: 6 x 4
  Pclass Name    Sex      Age
   <dbl> <chr>   <chr>  <dbl>
1      2 officer male      NA
2      3 mr      male      27
3      3 miss    female    18
4      3 mr      male      NA
5      2 mr      male      29
6      3 officer male      25

Using this approach you don't need a function nor any kind of loop and your conditions are clearly arranged. As a rule of thumb: Try avoiding loops. Usally there are more sophisticated ways of performing a task without loops. R uses "hidden loops" inside functions optimized for performance. However there are tasks well suited for loops. So the decision depends on the actual task.

Upvotes: 3

I think that the function has a parameter x and returns x, but the for cycle is aplied to a (I guess) data.frame "comb". In order to perform a call to the function output <- age_fill(comb) you should replace comb$myVariable by x$myVariable so that all the operation within the for cycle can be applied.

Upvotes: 1

Related Questions