Vincent Collis
Vincent Collis

Reputation: 49

Consolidating duplicate Rows in R using ddply

Hi I am trying to combine duplicate rows of data in R using ddply. Here is an example of the data I am working with:

name <- c("Bob", "Mary", "Bob", "Dillan", "Bob", "Mary")
age <- c(30, 20, 30, 25, 29, 20)
address <- c("123 Fake Street", "321 Park Ave", "123 Fake Street", "49 Rodeo Drive", "10 Broadway", "321 Park Ave")
election.count <- c("1", "1", "1", "1", "1", "1")
df <- data.frame(name, age, address, election.count)

    name age             address election.count
1    Bob  30     123 Fake Street             1
2   Mary  20        321 Park Ave             1
3    Bob  30     123 Fake Street             1
4 Dillan  25      49 Rodeo Drive             1
5    Bob  29         10 Broadway             1
6   Mary  20        321 Park Ave             1

I am looking to combine rows with the same Name and Age. Using ddply I get

ddply(df, "name", numcolwise(sum))

    name age   election.count
1    Bob  89                3
2 Dillan  25                1
3   Mary  20                2

Is there a modification to ddply so I am able to get

    name age              address  election.count
1    Bob  30      123 Fake Street               1
2    Bob  29          10 Broadway               2
2 Dillan  25       49 Rodeo Drive               1
3   Mary  20         321 Park Ave               2

Upvotes: 0

Views: 353

Answers (4)

Ven Yao
Ven Yao

Reputation: 3710

You can also set the rownames.

ddply(df, .(name, age), summarize, election.count=nrow(piece))
#    name age election.count
# 1    Bob  29              1
# 2    Bob  30              2
# 3 Dillan  25              1
# 4   Mary  20              2

Upvotes: 0

Hao
Hao

Reputation: 7826

library(dplyr)

df %>% 
  group_by(name, age) %>% 
  tally()

and you get

Source: local data frame [4 x 3]
Groups: name [?]

    name   age     n
    (fctr) (dbl) (int)
1    Bob    29     1
2    Bob    30     2
3 Dillan    25     1
4   Mary    20     2

Update: @David is right. count is a much simpler choice. :)

Upvotes: 1

akrun
akrun

Reputation: 886938

You can include the grouping variable 'address' also to get the expected output. Using data.table, we convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'name', 'age', 'address', we get the nrow (.N).

library(data.table)
setDT(df)[, list(election.count=.N), .(name, age, address)]
#     name age         address election.count
#1:    Bob  30 123 Fake Street              2
#2:   Mary  20    321 Park Ave              2
#3: Dillan  25  49 Rodeo Drive              1
#4:    Bob  29     10 Broadway              1

Upvotes: 1

Max Ghenis
Max Ghenis

Reputation: 15783

I don't get the election.count output from the ddply(df, "name", numcolwise(sum)) call, only name and age (as a sum).

That said, you can group by multiple columns in plyr functions using .(col1, col2) syntax. For example, I think this gets what you want:

ddply(df, .(name, age), nrow)
#     name age V1
# 1    Bob  29  1
# 2    Bob  30  2
# 3 Dillan  25  1
# 4   Mary  20  2

Upvotes: -1

Related Questions