Issue with naming new column using aggregate

Question

For some reason, aggregate is giving me the wrong column names, even though the data are still coming out correct. Can anyone tell me why (am I doing something wrong)?

For example, with a dataframe df:

df <- structure(list(Site = c(1L, 1L, 1L, 2L, 2L, 2L), Sample = c(1L, 
2L, 3L, 1L, 2L, 3L), Diameter = 1:6), .Names = c("Site", "Sample", 
"Diameter"), class = "data.frame", row.names = c(NA, -6L))

which looks like

    Site Sample Diameter
1    1      1        1
2    1      2        2
3    1      3        3
4    2      1        4
5    2      2        5
6    2      3        6

I run the following code

# Add column to calculate area from diameter
df['Area'] = ((df['Diameter']/2)^2)*pi

# Subset sites
Site1 <- subset(df, Site == "1")

# Calculate total area for each site
Site1_area <- aggregate(Site1$Area, by=list(Sample=Site1$Sample), sum, na.rm=TRUE)

Site1_area

This gives the new dataframe Site1_area as

    Sample  Diameter
1      1 0.7853982
2      2 3.1415927
3      3 7.0685835

where the calculated areas have been preserved, but the column name is now incorrectly given as Diameter instead of Area. I know I can rename this using

colnames(Site1_area) <- c("Sample", "Area")

but it seems odd to me that the column isn't being named correctly to begin with. Can anyone tell me why? Am I doing something incorrectly?

Many thanks!

IRTFM · Accepted Answer

You made an error that wasn't caught when you did this:

df['Area'] = ((df['Diameter']/2)^2)*pi

Should have been:

df[['Area']] = ((df[['Diameter']]/2)^2)*pi

After you did this you had:

> df
  Site Sample Diameter   Diameter
1    1      1        1  0.7853982
2    1      2        2  3.1415927
3    1      3        3  7.0685835
4    2      1        4 12.5663706
5    2      2        5 19.6349541
6    2      3        6 28.2743339

So you never really had a column named "Area". If you want the labeling to be simple then try the aggregate.formula method:

Site1_area2 <- aggregate(Area~Sample, data=Site1, sum, na.rm=TRUE)
> Site1_area2
  Sample      Area
1      1 0.7853982
2      2 3.1415927
3      3 7.0685835

Issue with naming new column using aggregate

Answers (1)

Related Questions