Reputation: 49
Hi I am trying to combine duplicate rows of data in R using ddply. Here is an example of the data I am working with:
name <- c("Bob", "Mary", "Bob", "Dillan", "Bob", "Mary")
age <- c(30, 20, 30, 25, 29, 20)
address <- c("123 Fake Street", "321 Park Ave", "123 Fake Street", "49 Rodeo Drive", "10 Broadway", "321 Park Ave")
election.count <- c("1", "1", "1", "1", "1", "1")
df <- data.frame(name, age, address, election.count)
name age address election.count
1 Bob 30 123 Fake Street 1
2 Mary 20 321 Park Ave 1
3 Bob 30 123 Fake Street 1
4 Dillan 25 49 Rodeo Drive 1
5 Bob 29 10 Broadway 1
6 Mary 20 321 Park Ave 1
I am looking to combine rows with the same Name and Age. Using ddply I get
ddply(df, "name", numcolwise(sum))
name age election.count
1 Bob 89 3
2 Dillan 25 1
3 Mary 20 2
Is there a modification to ddply so I am able to get
name age address election.count
1 Bob 30 123 Fake Street 1
2 Bob 29 10 Broadway 2
2 Dillan 25 49 Rodeo Drive 1
3 Mary 20 321 Park Ave 2
Upvotes: 0
Views: 353
Reputation: 3710
You can also set the rownames.
ddply(df, .(name, age), summarize, election.count=nrow(piece))
# name age election.count
# 1 Bob 29 1
# 2 Bob 30 2
# 3 Dillan 25 1
# 4 Mary 20 2
Upvotes: 0
Reputation: 7826
library(dplyr)
df %>%
group_by(name, age) %>%
tally()
and you get
Source: local data frame [4 x 3]
Groups: name [?]
name age n
(fctr) (dbl) (int)
1 Bob 29 1
2 Bob 30 2
3 Dillan 25 1
4 Mary 20 2
Update:
@David is right. count
is a much simpler choice. :)
Upvotes: 1
Reputation: 886938
You can include the grouping variable 'address' also to get the expected output. Using data.table
, we convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'name', 'age', 'address', we get the nrow (.N
).
library(data.table)
setDT(df)[, list(election.count=.N), .(name, age, address)]
# name age address election.count
#1: Bob 30 123 Fake Street 2
#2: Mary 20 321 Park Ave 2
#3: Dillan 25 49 Rodeo Drive 1
#4: Bob 29 10 Broadway 1
Upvotes: 1
Reputation: 15783
I don't get the election.count
output from the ddply(df, "name", numcolwise(sum))
call, only name
and age
(as a sum).
That said, you can group by multiple columns in plyr
functions using .(col1, col2)
syntax. For example, I think this gets what you want:
ddply(df, .(name, age), nrow)
# name age V1
# 1 Bob 29 1
# 2 Bob 30 2
# 3 Dillan 25 1
# 4 Mary 20 2
Upvotes: -1