Reputation: 43
I have a data frame in R, in which one of the columns contains state abbreviations like 'AL','MD' etc.
Say I wanted to extract the data for state = 'AL', then the following condition dataframe['AL',] only seems to return one row, whereas there are multiple rows against this state.
Can someone help me understand the error in this approach.
Upvotes: 0
Views: 1483
Reputation: 10855
In R, there are always multiple ways to do something. We'll illustrate three different techniques that can be used to subset data in a data frame based on a logical condition.
We'll use data from the 2012 U.S. Hospital Compare Database. We'll check to see whether the data has already been downloaded to disk, and if not, download and unzip it.
if(!file.exists("outcome-of-care-measures.zip")){
dlMethod <- "curl"
if(substr(Sys.getenv("OS"),1,7) == "Windows") dlMethod <- "wininet"
url <- "https://d396qusza40orc.cloudfront.net/rprog%2Fdata%2FProgAssignment3-data.zip"
download.file(url,destfile='outcome-of-care-measures.zip',method=dlMethod,mode="wb")
unzip(zipfile = "outcome-of-care-measures.zip")
}
## read outcome data & keep hospital name, state, and some
## mortality rates. Notice that here we use the extract operator
## to subset columns instead of rows
theData <- read.csv("outcome-of-care-measures.csv",
colClasses = "character")[,c(2,7,11,17,23)]
This first technique matches the one from the other answer, but we illustrate it with both $
and [[
forms of the extract operator during the subset operation.
# technique 1: extract operator
aSubset <- theData[theData$State == "AL",]
table(aSubset$State)
AL
98
aSubset <- theData[theData[["State"]] == "AL",]
table(aSubset$State)
AL
98
>
Next, we can subset by using a Base R function, such as subset()
.
# technique 2: subset() function
aSubset <- subset(theData,State == "AL")
table(aSubset$State)
AL
98
>
Finally, for the tidyverse
fans, we'll use dplyr::filter()
.
# technique 3: dplyr::filter()
aSubset <- dplyr::filter(theData,State == "AL")
table(aSubset$State)
AL
98
>
Upvotes: 0
Reputation: 4358
this should work
mydataframe[mydataframe$state == "AL",]
or if you want more than one sate
mydataframe[mydataframe$state %in% c("AL","MD"),]
Upvotes: 1