Richard Ball
Richard Ball

Reputation: 560

How do I determine the standard deviation of multiple categories within the same column in R

I have two columns, 'Name' and 'Price'. I'd like to determine the standard deviation of the prices for each group/category of the column 'Name'. These columns are in the form of a data frame in R.

                   Name    Price
1               Llandudno 21700000
2               Llandudno 15500000
3               Llandudno  9800000
4                Hout Bay  2995000

Upvotes: 0

Views: 298

Answers (3)

akrun
akrun

Reputation: 887531

We can use sqldf. The advantage is that we don't need to translate the code if we work on sql as well.

library(sqldf)
sqldf("SELECT Suburb, stdev(Price) AS SDPrice
       FROM df1
       GROUP BY Suburb")
#     Suburb   Price
#1  Hout Bay       0
#2 Llandudno 5951750

Upvotes: 1

lmo
lmo

Reputation: 38510

Say you have a data.frame named df. Here is a base R method using aggregate:

aggregate(Price ~ Suburb, data=df, FUN=sd)

This creates a data.frame that has the standard deviations by name.


The equivalent of aggregate using the data.table package, would be as follows:

library(data.table)
setDT(df)

df[, list("Name"=sd(Price)), by=.(Suburb)]

data

df <- read.table(header=TRUE, text="Suburb    Price
Llandudno 21700000
                 Llandudno 15500000
                 Llandudno  9800000
                 Hout_Bay  2995000
                 Hout_Bay  3995000")

Upvotes: 1

Sotos
Sotos

Reputation: 51592

You can use dplyr,

library(dplyr)
df %>% 
   group_by(Name) %>% 
   summarise(new = sd(Price))

#Source: local data frame [2 x 2]

#     Suburb     new
#      (chr)   (dbl)
#1  Hout Bay     NaN
#2 Llandudno 5951750

Note that 'Hout Bay' gives NaN which is expected since you only have one value

Upvotes: 1

Related Questions