LMC
LMC

Reputation: 1

Problem with using filter function to remove missing values from a dataset

I am required to remove all observations where at least one of these variables Loading Date, year of built, vessel type and cargo size contain missing values.

    anyNA(CW_data$`Loading Date`) #result is FALSE, which means there aren't missing values
    anyNA(CW_data$`Year Built`) #result is TRUE, there are missing values
    anyNA(CW_data$`Vessel Type`)#result is TRUE, there are missing values
    anyNA(CW_data$`Cargo Size`)#result is TRUE, there are missing values

    CW_data_noNA <- filter(CW_data, is.na('Year Built')==FALSE |
                   is.na('Vessel Type'==FALSE)|
                   is.na('Cargo Size')==FALSE |
                     is.na('Loading Date') == FALSE)

I tried with the above code, but the resulting dataset is identical to the original one. May someone explain what I am doing wrong? many thanks, LMC

Upvotes: 0

Views: 1914

Answers (3)

Guilherme Jardim
Guilherme Jardim

Reputation: 163

If you want to use filter you can do like this:

CW_data_noNA <- CW_data %>% 
    filter(!is.na(`Year Built`) & !is.na(`Vessel Type`) &
           !is.na(`Cargo Size`) & !is.na(`Loading Date`)
           )

When you have strange names in columns you need to use backticks ``. In general, I think it's better to avoid whitespaces for column names.

Regarding the code you provided, is.na already returns a logical, so you can use the !is.na instead of is.na() == FALSE. The pipe %>% also allows you to get a cleaner code!

Next time, try providing a reproducible example with your data or some sample data for better understanding.

Upvotes: 0

Mercury
Mercury

Reputation: 111

This may work your situation

CW_data_noNA <- CW_data %>% drop_na()

Upvotes: 0

Cettt
Cettt

Reputation: 11981

You can use filter_at:

CW_data_noNA <- filter_at(CW_data, vars('Year Built', 'Vessel Type', 'Cargo Size', 'Loading Date'), 
                            all_vars(!is.na(.)))

If you want use filter instead you can do this:

CW_data_noNA <- CW_data %>% 
                 filter(!is.na('Year Built'), !is.na('Vessel Type'),
                        !is.na('Cargo Size'), !is.na('Loading Date'))

This keeps all rows where none of the four columns is NA. Inside filter various conditions are always concatenated using &.

If you instead want to keep those row where not all four columns are NA simultaneously use:

W_data %>% 
   filter(!is.na('Year Built') | !is.na('Vessel Type') |
          !is.na('Cargo Size') | is.na('Loading Date'))

Upvotes: 2

Related Questions