Gie
Gie

Reputation: 31

Why are aggregate data frame and aggregate formula do not return the same results?

I just started learning R last month and I am learning the aggregate functions.

To start off, I have a data called property and I am trying to get the mean price per city.

I first used the formula method of aggregate:

mean_price_per_city_1 <- aggregate(PRICE ~ PROPERTYCITY,
  property_data, mean)

The results are as follow (just the head):

PROPERTYCITY PRICE
1.00
ALLISON PARK 193814.08
AMBRIDGE 62328.92
ASPINWALL 226505.50
BADEN 400657.52
BAIRDFORD 59337.37

Then I decided to try the data frame method:

mean_price_per_city_2 <- aggregate(list(property_data$PRICE),
                             by = list(property_data$PROPERTYCITY),
                             FUN = mean)

The results are as follow (just the head):

Group.1 c.12000L.. 1783L..4643L..
1.00
ALLISON PARK NA
AMBRIDGE 62328.92
ASPINWALL 226505.50
BADEN 400657.52
BAIRDFORD 59337.37

I thought that the two methods will return the same results. However I noticed that when I used the data frame method, there are NAs in the second column.

I tried checking if there are NAs in the PRICE column, but there is none. So I am lost why the two methods don't return the same values.

Upvotes: 1

Views: 264

Answers (1)

dcarlson
dcarlson

Reputation: 11046

You have two issues. First aggregate(list(property_data$PRICE), by = list(property_data$PROPERTYCITY), FUN = mean) should just have property_data$PRICE without the list. Only the by= argument must be a list. That is why your column name is so strange. Second, as documented in the manual page (?aggregate), the formula method has a default value of na.action=na.omit, but the method for class data.frame does not. Since you have at least one missing value in the ALLISON PARK group, the formula command deleted that value, but the second command did not so the result for ALLISON PARK is NA.

Upvotes: 3

Related Questions