Michelle Thompson
Michelle Thompson

Reputation: 1

Using tapply to calculate the mean of a group

In the following CSV file:

Species, Age
australian, 2.6
australian, 2.3
brown, 2.3
brown, 2.3
brown, 3.4
brown, 3.4
dalmatian, 5.1
dalmatian, 4.4
dalmatian, 4.4
dalmatian, 4.1
dalmatian, 4.2
dalmatian, 4.7
dalmatian, 5.5

I am attempting to calculate the mean for the Pelican species, but R is displaying an error about unequal lengths.

df <- read.csv('c:/Users/Michelle/Downloads/pelican.csv')
tapply(df$Species, df$Age, mean)

Error in tapply(df$Species, df$Age, mean) : arguments must have same length

I assumed the tapply function would output each pelican species with the mean age of each. Unfortunately, the director at the University of Florida is insisting I use base R functions.

Edit 1:

str(df) 'data.frame': 13 obs. of 2 variables: $ Species: chr "australian" "australian" "brown" "brown" ... $ Age : num 2.6 2.3 2.3 2.3 3.4 3.4 5.1 4.4 4.4 4.1 ... 

dput(df) structure(list(Species = c("australian", "australian", "brown", "brown", "brown", "brown", "dalmatian", "dalmatian", "dalmatian", "dalmatian", "dalmatian", "dalmatian", "dalmatian"), Age = c(2.6, 2.3, 2.3, 2.3, 3.4, 3.4, 5.1, 4.4, 4.4, 4.1, 4.2, 4.7, 5.5)), class = "data.frame", row.names = c(NA, -13L))

Thank you Pedro for the help.

Thank you for any help you can provide.

M.

Upvotes: 0

Views: 1022

Answers (1)

Pedro Faria
Pedro Faria

Reputation: 869

Welcome Michelle! The tapply function works with two main objets (these objects need to be vectors), called X and INDEX. What the error messages is telling you, is that X and INDEX does not have the same length.

The example below, reproduces the same error that you are facing. See that the X object have 4 elements, but INDEX have only 2.

tapply(X = c(5, 6, 7, 8), INDEX = c(1, 2), mean)

This means that, to fix your error, the first and second objects that you pass to tapply(), need to have the same length. In your example, these two objects are df$Species and df$Age. You can confirm if df$Species and df$Age does not have the same length, by comparing the result of length(df$Species) and length(df$Age). If they are equal, then, these two vectors have the same length. But, if they are not equal, then these two vectors have different lengths.

What is probably going wrong in your code, is that the read.csv() function is not correctly reading your CSV file. Maybe df was transformed to a list, and not a data.frame. We cannot give better help than this for you, because we do not know what the df object is, or, how it is structured in your R session.

You could give these useful information for us, by copying and pasting the result of str(df) command, or, dput(df). Both of these commandos would give us enough information to probably point out exactly what you need to do. So, next time, when you post a question, is good idea to include these infos.

Anyway, when I copy and paste the CSV file that you passed, and try to run your code, everything works fine. So, again, your df object is probably not structured as you expected, probably because of some problem at the read.csv() function.

text <- "
Species, Age
australian, 2.6
australian, 2.3
brown, 2.3
brown, 2.3
brown, 3.4
brown, 3.4
dalmatian, 5.1
dalmatian, 4.4
dalmatian, 4.4
dalmatian, 4.1
dalmatian, 4.2
dalmatian, 4.7
dalmatian, 5.5"

data <- readr::read_csv(text)
tapply(data$Age, data$Species, mean)

Result:

australian      brown  dalmatian 
  2.450000   2.850000   4.628571 

Upvotes: 0

Related Questions