Reputation: 31
I just started learning R last month and I am learning the aggregate functions.
To start off, I have a data called property and I am trying to get the mean price per city.
I first used the formula method of aggregate:
mean_price_per_city_1 <- aggregate(PRICE ~ PROPERTYCITY,
property_data, mean)
The results are as follow (just the head):
PROPERTYCITY | PRICE |
---|---|
1.00 | |
ALLISON PARK | 193814.08 |
AMBRIDGE | 62328.92 |
ASPINWALL | 226505.50 |
BADEN | 400657.52 |
BAIRDFORD | 59337.37 |
Then I decided to try the data frame method:
mean_price_per_city_2 <- aggregate(list(property_data$PRICE),
by = list(property_data$PROPERTYCITY),
FUN = mean)
The results are as follow (just the head):
Group.1 | c.12000L.. 1783L..4643L.. |
---|---|
1.00 | |
ALLISON PARK | NA |
AMBRIDGE | 62328.92 |
ASPINWALL | 226505.50 |
BADEN | 400657.52 |
BAIRDFORD | 59337.37 |
I thought that the two methods will return the same results. However I noticed that when I used the data frame method, there are NAs in the second column.
I tried checking if there are NAs in the PRICE column, but there is none. So I am lost why the two methods don't return the same values.
Upvotes: 1
Views: 264
Reputation: 11046
You have two issues. First aggregate(list(property_data$PRICE), by = list(property_data$PROPERTYCITY), FUN = mean)
should just have property_data$PRICE
without the list. Only the by=
argument must be a list. That is why your column name is so strange. Second, as documented in the manual page (?aggregate), the formula method has a default value of na.action=na.omit
, but the method for class data.frame does not. Since you have at least one missing value in the ALLISON PARK group, the formula command deleted that value, but the second command did not so the result for ALLISON PARK is NA.
Upvotes: 3