Reputation: 31
Using the pre-installed dataset in R, mtcars, I'm trying to find the mean of the "mpg" variable for only Mercedes cars. I am new to R and learning on my own. I've figured out the average for mpg of all cars using the following:
read.csv ("mtcars.csv") mean(mtcars$mpg)
I thought of using something like a GROUP BY, to group only the 'Mercedes cars, but can't seem to figure it out. I'm sure it's really simple so I'm a little frustrated I'm not seeing what to do here next....
This is what the file looks like: https://gist.github.com/seankross/a412dfbd88b3db70b74b
Upvotes: 1
Views: 30862
Reputation: 2678
You can do this data.table
way too. Use this code:-
library(data.table)
dt <- mtcars
dt <- setDT(dt, keep.rownames = T)
dt <- dt[, merc := grepl("Merc", rn)]
dt <- dt[, merc := ifelse(merc == T, 1L, 0L)]
dt <- dt[merc == 1, merc_mean := mean(mpg), by = merc]
merc_mean
column will have mean
for all the Merc cars in their records.
Upvotes: 0
Reputation: 39154
In base R, mtcars
is a built-in data frame. You can type mtcars
in the console to view it.
Here I printed the first 10 rows of the mtcars
data frame.
head(mtcars, 10)
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
# Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
# Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
# Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
# Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
The information you need, the model, is stored in the row names. To access that information, we can use the rownames
function.
rownames(mtcars)
# [1] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710"
# [4] "Hornet 4 Drive" "Hornet Sportabout" "Valiant"
# [7] "Duster 360" "Merc 240D" "Merc 230"
# [10] "Merc 280" "Merc 280C" "Merc 450SE"
# [13] "Merc 450SL" "Merc 450SLC" "Cadillac Fleetwood"
# [16] "Lincoln Continental" "Chrysler Imperial" "Fiat 128"
# [19] "Honda Civic" "Toyota Corolla" "Toyota Corona"
# [22] "Dodge Challenger" "AMC Javelin" "Camaro Z28"
# [25] "Pontiac Firebird" "Fiat X1-9" "Porsche 914-2"
# [28] "Lotus Europa" "Ford Pantera L" "Ferrari Dino"
# [31] "Maserati Bora" "Volvo 142E"
The next thing we need to do is filter the row names to see if there are any elements match "Merc". We can use grepl
to achieve this, which returns a logical vector if there is a match. Here "^Merc" means to capture string with a beginning in "Merc".
grepl("^Merc", rownames(mtcars))
# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
# [14] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [27] FALSE FALSE FALSE FALSE FALSE FALSE
Finally, we can use the logical vector to subset the mtcars
data frame. After the subset, we can calculate the average of mpg
of the subset.
mtcars_merc <- mtcars[grepl("^Merc", rownames(mtcars)), ]
mean(mtcars_merc$mpg)
# [1] 19.01429
Upvotes: 4