Reputation: 560
I have two columns, 'Name' and 'Price'. I'd like to determine the standard deviation of the prices for each group/category of the column 'Name'. These columns are in the form of a data frame in R.
Name Price
1 Llandudno 21700000
2 Llandudno 15500000
3 Llandudno 9800000
4 Hout Bay 2995000
Upvotes: 0
Views: 298
Reputation: 887531
We can use sqldf
. The advantage is that we don't need to translate the code if we work on sql
as well.
library(sqldf)
sqldf("SELECT Suburb, stdev(Price) AS SDPrice
FROM df1
GROUP BY Suburb")
# Suburb Price
#1 Hout Bay 0
#2 Llandudno 5951750
Upvotes: 1
Reputation: 38510
Say you have a data.frame named df. Here is a base R method using aggregate
:
aggregate(Price ~ Suburb, data=df, FUN=sd)
This creates a data.frame that has the standard deviations by name.
The equivalent of aggregate
using the data.table
package, would be as follows:
library(data.table)
setDT(df)
df[, list("Name"=sd(Price)), by=.(Suburb)]
data
df <- read.table(header=TRUE, text="Suburb Price
Llandudno 21700000
Llandudno 15500000
Llandudno 9800000
Hout_Bay 2995000
Hout_Bay 3995000")
Upvotes: 1
Reputation: 51592
You can use dplyr
,
library(dplyr)
df %>%
group_by(Name) %>%
summarise(new = sd(Price))
#Source: local data frame [2 x 2]
# Suburb new
# (chr) (dbl)
#1 Hout Bay NaN
#2 Llandudno 5951750
Note that 'Hout Bay' gives NaN which is expected since you only have one value
Upvotes: 1