Frash
Frash

Reputation: 728

Exclude rows having 'NA' getting in any subsetting

The data has a character column with the string 'NA' also being in it. If I do a subset of the data based upon that column I get those rows having 'NA' in that column. And all such rows gets filled with NA replacing the actual data.

Given below a sample of the data:

sample.csv

SYMBOL,SERIES,CLOSE,TIMESTAMP
A2ZMES,EQ,10.8,4/1/2014
IIFLFIN,NA,999.2,4/1/2014
SCIT,NA,1150,4/1/2014
IIFLFIN,NA,1019.81,8/1/2014
IRFC,NA,1098.09,8/1/2014
AICHAMP,BE,14.15,4/1/2014

The code I've used for this purpose.

data = read.csv('sample.csv', as.is = T)
subdata = data [ data $SERIES=='EQ', ]

I want to get only those rows matching EQ in the SERIES column, don't want those 'NA' rows spoiling it. EQ is just a representative, sometimes I would need some other string matching, maybe even 'NA'.

Please help or any pointers to resolve this niggling issue. Best if this can be resolved by base R, otherwise I'm open to using any package. Thanks for reading.

Upvotes: 1

Views: 317

Answers (2)

Miha Trošt
Miha Trošt

Reputation: 2022

library(dplyr)

Data

data <-
    structure(list(SYMBOL = c("A2ZMES", "IIFLFIN", "SCIT", "IIFLFIN", "IRFC", "AICHAMP"),
                   SERIES = c("EQ", NA, NA, NA, NA, "BE"), 
                   CLOSE = c(10.8, 999.2, 1150, 1019.81, 1098.09, 14.15), 
                   TIMESTAMP = c("4/1/2014", "4/1/2014", "4/1/2014", "8/1/2014", "8/1/2014", "4/1/2014")), 
              .Names = c("SYMBOL", "SERIES", "CLOSE", "TIMESTAMP"), 
              class = "data.frame", 
              row.names = c(NA, -6L))

Solution

# filter non missing values

data %>% 
    filter(!is.na(SERIES))

   SYMBOL SERIES CLOSE TIMESTAMP
1  A2ZMES     EQ 10.80  4/1/2014
2 AICHAMP     BE 14.15  4/1/2014

Upvotes: 0

wijjy
wijjy

Reputation: 11

NA are not comparable to anything. You need to also remove all NA values to do this.

subdata = data [!is.na(data$SERIES) & data$SERIES=='EQ', ]

From ?"==":

Missing values (NA) and NaN values are regarded as non-comparable even to themselves, so comparisons involving them will always result in NA. Missing values can also result when character strings are compared and one is not valid in the current collation locale.

Upvotes: 1

Related Questions