Mandy
Mandy

Reputation: 129

Summarize data with condition and create new row (dplyr)

I'll try illustrate my question with an example.

Sample data frame:

myData <- data.frame(Country = c("Germany","UK","Mexico","Spain"),
                     MyCount = c(300,800,950,125),
                     Continent = c("Europe","Europe","America","Europe"))  

Country  MyCount Continent
Germany  300     Europe
UK       800     Europe
Mexico   950     America
Spain    125     Europe

Expected result:

Country MyCount Continent
Other   425     Europe
UK      800     Europe

I have tried this.

myData %>%
  filter(Continent == "Europe" & MyCount < 800)%>%
  add_row(Country = "Other", MyCount = sum(MyCount), Continent = "Europe")  

Upvotes: 0

Views: 259

Answers (3)

jazzurro
jazzurro

Reputation: 23574

If I am analyzing your sample right, the following would be one way to go. You seem to want data from Europe, then aggregate it for countries which have more than or equal to 800 in MyCount and other European countries. If so, you could replace all levels of European countries with "Other" for those which have less than 800 in MyCount and aggregate the data.

filter(myData, Continent == "Europe") %>%
group_by(Country = fct_other(Country, keep = Country[MyCount >= 800])) %>%
summarise(MyCount = sum(MyCount))

#  Country MyCount
#   <fctr>   <dbl>
#1      UK     800
#2   Other     425

Upvotes: 1

jmuhlenkamp
jmuhlenkamp

Reputation: 2150

Not entirely clear what you are looking for, but this will give you the result you have posted in the question.

library(dplyr)
myData<-data.frame(Country=c("Germany","UK","Mexico","Spain"),MyCount=c(300,800,950,125),Continent=c("Europe","Europe","America","Europe")) 

myData %>%
    filter(Continent == 'Europe') %>%
    mutate(Country = as.character(Country),
           Country = ifelse(Country %in% c('UK'), Country, 'Other')) %>%
    group_by(Country, Continent) %>%
    summarize(MyCount = sum(MyCount)) %>%
    select(Country, MyCount, Continent)

# A tibble: 2 x 3
# Groups:   Country [2]
   Country MyCount Continent
     <chr>   <dbl>    <fctr>
1   Other     425    Europe
2      UK     800    Europe

Upvotes: 1

Taran
Taran

Reputation: 365

@Mandy I'm not entrily clear on the specific requirmnts for your use case but this should work based on your comments. Uses group_by and summarise from dplyr.

myData %>% 
       filter(Continent == 'Europe') %>% 
       mutate(grp = ifelse(MyCount < 800, 'Other', Country)) %>% 
       group_by(grp) %>% 
       summarise(MyCount = sum(MyCount))

# A tibble: 2 × 2
grp MyCount
<chr>   <dbl>
1 Other     425
2    UK     800

Upvotes: 1

Related Questions