Reputation: 177
I want to remove the "Not Available" in the following data frame, but when I change Number
to numeric using the following code, the "Not Available" becomes 4
:
c1 <- c("India", "America", "China", "Europe", "Japan")
c2 <- c(2.3, 3.5, "Not Available", 1.2, 1.2)
data <- data.frame(Name=c1, Number=c2)
data$Number <- as.numeric(data$Number)
The result is:
data
## Name Number
## 1 India 2
## 2 America 3
## 3 China 4
## 4 Europe 1
## 5 Japan 1
How can I remove the "Not Available" rows in this data frame?
Upvotes: 3
Views: 5610
Reputation: 887128
We could also read the dataset with na.strings = "Not Available"
in the read.csv/read.table
so that it will return as NA
value which can be removed with ?is.na
or ?complete.cases
or ?na.omit
.
df1 <- read.csv("file.csv", na.strings="Not Available")
res <- df1[complete.cases(df1$Number),]
Upvotes: 2
Reputation: 27388
This is because:
data.frame
only allows a single class of data per column. data.frame
, the default behaviour is for character
columns to be coerced to factor
, which are stored as numeric values (corresponding to factor levels) with labels. Your c2
vector is a character
vector since it has a character element ("Not Available"), and as such the Number
column of data
is a factor
column.numeric
, the resulting numbers indicate the factor levels. To achieve the behaviour you're after, you can either prevent the character data from being coerced to a factor when creating the data.frame:
data <- data.frame(Name=c1, Number=c2, stringsAsFactors=FALSE)
data$Number <- as.numeric(data$Number)
data
## Name Number
## 1 India 2.3
## 2 America 3.5
## 3 China NA
## 4 Europe 1.2
## 5 Japan 1.2
Alternatively, you can coerce the factor to numeric via character:
data$Number <- as.numeric(as.character(data$Number))
Neither of these options will "remove the Not Available rows", as you've requested. They just convert the "Not Available" elements (and any other "text" elements of the Number
column) to NA
. To remove the rows containing "Not Available", you can do:
data <- data.frame(Name=c1, Number=c2, stringsAsFactors=FALSE)
na.omit(data)
or, using your original data
object:
data <- data.frame(Name=c1, Number=c2)
data[data$Number != 'Not Available', ]
Upvotes: 5