Reputation: 43
Very new to R.
I have a simple data set with two columns : name and length. The data I have shows some names that have two occurrences. How do I average these lengths and then only list 1 name with the averaged length instead of the 2? Thank you.
Upvotes: 0
Views: 9201
Reputation: 887501
Or using data.table
(data from @Marat Talipov's post)
library(data.table)
setDT(d)[, list(length=mean(length)), name][]
Upvotes: 1
Reputation: 8377
And how about an original solution with a linear fit! in just one line:
lm(length ~ name - 1, df)$coef
### namea nameb namec
### 5.0 8.5 7.0
Upvotes: 3
Reputation: 301
If I understand you correctly, you're looking to calculate the mean length for each name. I'd tackle it like this.
library(plyr)
df.new <- ddply(df, .(name), summarise, length=mean(length))
Given you're new to R, I encourage you to take the time to learn some of Hadley Wickham's packages plyr
(or dplyr
), reshape2
and ggplot2
. They're specifically designed to make lots of these data operations more intuitive than base R.
Upvotes: 0
Reputation: 13304
Here is a couple of approaches:
-With base R:
aggregate(length~name,d,mean)
# name length
# 1 a 5.0
# 2 b 8.5
# 3 c 7.0
-With the dplyr
package (definitely worth spending time to explore)
library(dplyr)
d %>%
group_by(name) %>%
summarize(avg=mean(length))
# Source: local data frame [3 x 2]
#
# name avg
# 1 a 5.0
# 2 b 8.5
# 3 c 7.0
Sample reproducible data set could be produced by these commands:
set.seed(1)
d <- data.frame(name=sample(letters[1:3],size=5,replace=TRUE),length=sample(10,size=5,replace=TRUE))
# name length
# 1 a 9
# 2 b 10
# 3 b 7
# 4 c 7
# 5 a 1
Upvotes: 14
Reputation: 139
Definitely not the R way or the best way but you could do
library(sqldf)
df <- howeveryougetyourdata.csv
sqldf('SELECT AVG(length) average_length FROM df WHERE name IN ("this","that"))
Upvotes: 0