user4634200
user4634200

Reputation: 43

R - Averaging rows that have the same name

Very new to R.

I have a simple data set with two columns : name and length. The data I have shows some names that have two occurrences. How do I average these lengths and then only list 1 name with the averaged length instead of the 2? Thank you.

Upvotes: 0

Views: 9201

Answers (5)

akrun
akrun

Reputation: 887501

Or using data.table (data from @Marat Talipov's post)

library(data.table)
setDT(d)[, list(length=mean(length)), name][]

Upvotes: 1

agenis
agenis

Reputation: 8377

And how about an original solution with a linear fit! in just one line:

    lm(length ~ name - 1, df)$coef
### namea nameb namec 
###   5.0   8.5   7.0 

Upvotes: 3

AJD
AJD

Reputation: 301

If I understand you correctly, you're looking to calculate the mean length for each name. I'd tackle it like this.

library(plyr)
df.new <- ddply(df, .(name), summarise, length=mean(length))

Given you're new to R, I encourage you to take the time to learn some of Hadley Wickham's packages plyr (or dplyr), reshape2 and ggplot2. They're specifically designed to make lots of these data operations more intuitive than base R.

Upvotes: 0

Marat Talipov
Marat Talipov

Reputation: 13304

Here is a couple of approaches:

-With base R:

aggregate(length~name,d,mean)
#   name length
# 1    a    5.0
# 2    b    8.5
# 3    c    7.0

-With the dplyr package (definitely worth spending time to explore)

library(dplyr)
d %>% 
  group_by(name) %>% 
  summarize(avg=mean(length))
# Source: local data frame [3 x 2]
# 
# name avg
# 1    a 5.0
# 2    b 8.5
# 3    c 7.0

Sample reproducible data set could be produced by these commands:

set.seed(1)
d <- data.frame(name=sample(letters[1:3],size=5,replace=TRUE),length=sample(10,size=5,replace=TRUE))

#   name length
# 1    a      9
# 2    b     10
# 3    b      7
# 4    c      7
# 5    a      1

Upvotes: 14

bbowler86
bbowler86

Reputation: 139

Definitely not the R way or the best way but you could do

library(sqldf)
df <- howeveryougetyourdata.csv
sqldf('SELECT AVG(length) average_length FROM df WHERE name IN ("this","that"))

Upvotes: 0

Related Questions