Reputation: 728
The data has a character column with the string 'NA' also being in it. If I do a subset of the data based upon that column I get those rows having 'NA' in that column. And all such rows gets filled with NA replacing the actual data.
Given below a sample of the data:
sample.csv
SYMBOL,SERIES,CLOSE,TIMESTAMP
A2ZMES,EQ,10.8,4/1/2014
IIFLFIN,NA,999.2,4/1/2014
SCIT,NA,1150,4/1/2014
IIFLFIN,NA,1019.81,8/1/2014
IRFC,NA,1098.09,8/1/2014
AICHAMP,BE,14.15,4/1/2014
The code I've used for this purpose.
data = read.csv('sample.csv', as.is = T)
subdata = data [ data $SERIES=='EQ', ]
I want to get only those rows matching EQ
in the SERIES
column, don't want those 'NA' rows spoiling it. EQ
is just a representative, sometimes I would need some other string matching, maybe even 'NA'.
Please help or any pointers to resolve this niggling issue. Best if this can be resolved by base R, otherwise I'm open to using any package. Thanks for reading.
Upvotes: 1
Views: 317
Reputation: 2022
library(dplyr)
data <-
structure(list(SYMBOL = c("A2ZMES", "IIFLFIN", "SCIT", "IIFLFIN", "IRFC", "AICHAMP"),
SERIES = c("EQ", NA, NA, NA, NA, "BE"),
CLOSE = c(10.8, 999.2, 1150, 1019.81, 1098.09, 14.15),
TIMESTAMP = c("4/1/2014", "4/1/2014", "4/1/2014", "8/1/2014", "8/1/2014", "4/1/2014")),
.Names = c("SYMBOL", "SERIES", "CLOSE", "TIMESTAMP"),
class = "data.frame",
row.names = c(NA, -6L))
# filter non missing values
data %>%
filter(!is.na(SERIES))
SYMBOL SERIES CLOSE TIMESTAMP
1 A2ZMES EQ 10.80 4/1/2014
2 AICHAMP BE 14.15 4/1/2014
Upvotes: 0
Reputation: 11
NA are not comparable to anything. You need to also remove all NA values to do this.
subdata = data [!is.na(data$SERIES) & data$SERIES=='EQ', ]
From ?"=="
:
Missing values (NA) and NaN values are regarded as non-comparable even to themselves, so comparisons involving them will always result in NA. Missing values can also result when character strings are compared and one is not valid in the current collation locale.
Upvotes: 1