Reputation: 33
Suppose I have the following table,
Name | City | Value |
---|---|---|
Tom | NY | 1 |
Tom | NY | 2 |
Tom | NY | 4 |
James | NY | 1 |
James | NY | 2 |
Tony | DC | 1 |
Tony | DC | 2 |
I want to find the total value city-wise, however for each Name only their maximum value must be taken. So for this table I should get,
City | Value |
---|---|
NY | 6 |
DC | 2 |
I have tried using various methods using dyplr but none seem to work
Upvotes: 3
Views: 3226
Reputation: 16876
Here is a data.table
approach in one line:
library(data.table)
setDT(df)[df[, .I[which.max(Value)], by=c("Name", "City")]$V1][, sum(Value), by=City]
Output
City V1
1: NY 6
2: DC 2
base R
aggregate(Value ~ City, merge(aggregate(Value ~ Name + City, data=df, max), df, all.x=T), sum)
City Value
1 DC 2
2 NY 6
Data
df <- structure(list(
Name = c("Tom", "Tom", "Tom", "James", "James",
"Tony", "Tony"),
City = c("NY", "NY", "NY", "NY", "NY", "DC",
"DC"),
Value = c(1L, 2L, 4L, 1L, 2L, 1L, 2L)
),
class = "data.frame",
row.names = c(NA, -7L))
Upvotes: 1
Reputation: 389315
Here's another approach -
library(dplyr)
df %>%
arrange(desc(Value)) %>%
distinct(Name, City, .keep_all = TRUE) %>%
group_by(City) %>%
summarise(Value = sum(Value))
# City Value
# <chr> <int>
#1 DC 2
#2 NY 6
Upvotes: 2
Reputation: 1560
Lets assume your dataset is called df1:
library(dplyr)
df1 %>% group_by(City, Name) %>% slice_max(Value, with_ties = FALSE) %>% ungroup() %>% select(City, Value) %>% group_by(City) %>% summarise(Value = sum(Value))
Upvotes: 0