Reputation: 31

How to find mean for subset using R?

Using the pre-installed dataset in R, mtcars, I'm trying to find the mean of the "mpg" variable for only Mercedes cars. I am new to R and learning on my own. I've figured out the average for mpg of all cars using the following:

read.csv ("mtcars.csv") mean(mtcars$mpg)

I thought of using something like a GROUP BY, to group only the 'Mercedes cars, but can't seem to figure it out. I'm sure it's really simple so I'm a little frustrated I'm not seeing what to do here next....

This is what the file looks like: https://gist.github.com/seankross/a412dfbd88b3db70b74b

Upvotes: 1

Answers (2)

sm925

Reputation: 2678

You can do this data.table way too. Use this code:-

library(data.table)

dt <- mtcars
dt <- setDT(dt, keep.rownames = T)
dt <- dt[, merc := grepl("Merc", rn)]
dt <- dt[, merc := ifelse(merc == T, 1L, 0L)]
dt <- dt[merc == 1, merc_mean := mean(mpg), by = merc]

merc_mean column will have mean for all the Merc cars in their records.

Upvotes: 0

www

Reputation: 39154

In base R, mtcars is a built-in data frame. You can type mtcars in the console to view it.

Here I printed the first 10 rows of the mtcars data frame.

head(mtcars, 10)
#                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
# Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
# Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
# Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
# Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
# Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
# Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
# Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
# Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
# Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
# Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4

The information you need, the model, is stored in the row names. To access that information, we can use the rownames function.

rownames(mtcars)
# [1] "Mazda RX4"           "Mazda RX4 Wag"       "Datsun 710"         
# [4] "Hornet 4 Drive"      "Hornet Sportabout"   "Valiant"            
# [7] "Duster 360"          "Merc 240D"           "Merc 230"           
# [10] "Merc 280"            "Merc 280C"           "Merc 450SE"         
# [13] "Merc 450SL"          "Merc 450SLC"         "Cadillac Fleetwood" 
# [16] "Lincoln Continental" "Chrysler Imperial"   "Fiat 128"           
# [19] "Honda Civic"         "Toyota Corolla"      "Toyota Corona"      
# [22] "Dodge Challenger"    "AMC Javelin"         "Camaro Z28"         
# [25] "Pontiac Firebird"    "Fiat X1-9"           "Porsche 914-2"      
# [28] "Lotus Europa"        "Ford Pantera L"      "Ferrari Dino"       
# [31] "Maserati Bora"       "Volvo 142E"

The next thing we need to do is filter the row names to see if there are any elements match "Merc". We can use grepl to achieve this, which returns a logical vector if there is a match. Here "^Merc" means to capture string with a beginning in "Merc".

grepl("^Merc", rownames(mtcars))
# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
# [14]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [27] FALSE FALSE FALSE FALSE FALSE FALSE

Finally, we can use the logical vector to subset the mtcars data frame. After the subset, we can calculate the average of mpg of the subset.

mtcars_merc <- mtcars[grepl("^Merc", rownames(mtcars)), ]
mean(mtcars_merc$mpg)
# [1] 19.01429

Upvotes: 4

How to find mean for subset using R?

Answers (2)

Related Questions