Reputation: 1471
We have a data frame from a CSV file. The data frame DF
has columns that contain observed values and a column (VaR2
) that contains the date at which a measurement has been taken. If the date was not recorded, the CSV file contains the value NA
, for missing data.
Var1 Var2
10 2010/01/01
20 NA
30 2010/03/01
We would like to use the subset command to define a new data frame new_DF
such that it only contains rows that have an NA'
value from the column (VaR2
). In the example given, only Row 2 will be contained in the new DF
.
The command
new_DF<-subset(DF, DF$Var2=="NA")
does not work, the resulting data frame has no row entries.
If in the original CSV file the Value NA
are exchanged with NULL
, the same command produces the desired result:
new_DF <- subset(DF, DF$Var2=="NULL")
How can I get this method working, if for the character string the value NA
is provided in the original CSV file?
Upvotes: 135
Views: 442200
Reputation: 17174
As another approach using the relatively recent hablar package also makes this easy with the find_na()
function, and can be used as-is or within the dplyr
pipe:
Some sample data
myCars <- mtcars[1:5,1:3]
myCars[c(1,5), 1] <- NA
myCars[c(3,4), 2] <- NA
# mpg cyl disp
# Mazda RX4 NA 6 160
# Mazda RX4 Wag 21.0 6 160
# Datsun 710 22.8 NA 108
# Hornet 4 Drive 21.4 NA 258
# Hornet Sportabout NA 8 360
Code:
# native
hablar::find_na(myCars, cyl)
# dplyr
myCars %>%
hablar::find_na(cyl)
Both return:
# mpg cyl disp
# Datsun 710 22.8 NA 108
# Hornet 4 Drive 21.4 NA 258
Note that hthis can be extended: hablar::find_na(myCars, c(cyl, mpg))
will return NA
values from both columns, and hablar::find_na(myCars)
will return all rows with NA
in any column.
Upvotes: 0
Reputation: 56
This will give the tally of number of NA (TRUE) and FALSE values in the table:
table(is.na(df))
Upvotes: 0
Reputation: 1
In addition to the main answer, if you want all rows with one or more NAs:
DF_rows_with_NAs = DF[rowSums(is.na(DF)) > 0, ]
DF_rows_with_NAs
sum(is.na(DF_rows_with_NAs))
This should equal the total NAs of the whole DF.
DF_columns_with_NAs=DF[ ,colSums(is.na(DF)) > 0]
DF_columns_with_NAs
Verification:
sum(is.na(ts_columns_with_NAs))
This should equal the total NAs of the whole DF.
Upvotes: 0
Reputation: 16978
Since dplyr
s filter_all
has been superseded
Scoped verbs (_if, _at, _all) have been superseded by the use of
across()
in an existing verb.
and the usage of across()
in filter()
is deprecated, Ronak Pol's answer needs a small update. To find all rows with an NA
anywhere, we could use
library(dplyr)
DF %>%
filter(if_any(everything(), is.na))
to get
# A tibble: 1 x 2
Var1 Var2
<dbl> <date>
1 20 NA
Upvotes: 8
Reputation: 381
new_data <- data %>% filter_all(any_vars(is.na(.)))
This should create a new data frame (new_data
) with only the missing values in it.
Works best to keep a track of values that you might later drop because they had some columns with missing observations (NA).
Upvotes: 28
Reputation: 3439
complete.cases
gives TRUE
when all values in a row are not NA
DF[!complete.cases(DF), ]
Upvotes: 67
Reputation: 877
Prints all the rows with NA data:
tmp <- data.frame(c(1,2,3),c(4,NA,5));
tmp[round(which(is.na(tmp))/ncol(tmp)),]
Upvotes: -1
Reputation: 108523
Never use =='NA' to test for missing values. Use is.na()
instead. This should do it:
new_DF <- DF[rowSums(is.na(DF)) > 0,]
or in case you want to check a particular column, you can also use
new_DF <- DF[is.na(DF$Var),]
In case you have NA character values, first run
Df[Df=='NA'] <- NA
to replace them with missing values.
Upvotes: 193
Reputation: 983
NA is a special value in R, do not mix up the NA value with the "NA" string. Depending on the way the data was imported, your "NA" and "NULL" cells may be of various type (the default behavior is to convert "NA" strings to NA values, and let "NULL" strings as is).
If using read.table() or read.csv(), you should consider the "na.strings" argument to do clean data import, and always work with real R NA values.
An example, working in both cases "NULL" and "NA" cells :
DF <- read.csv("file.csv", na.strings=c("NA", "NULL"))
new_DF <- subset(DF, is.na(DF$Var2))
Upvotes: 42