Reputation: 15
I am trying to modify a dataframe, similar to "cars" dataframe (mine is called countries). Its type is a list.
First column is the name of countries (type chr). Second column is data (type dbl).
Country Number
1 Portugal 100000
2 Poland 200000
3 Israel 300000
4 South Africa 400000
5 Austria 500000
I want to rename/group countries in the first column by "Others" when number is over 250000 (for example) and then plot a graph with ggplot.
I got a good bar chart for the moment but I want to have one with "others" now. "Others" bar will just be the sum of the number by country.
Which method is the most efficient to manipulate the data? To create a function with "if" and apply it to the dataframe? Or to create a new column with two categories and then sum all "others"?
I already tried to manipulate the code by using the pipe %>%
and mutate.
Upvotes: 1
Views: 47
Reputation: 306
a solution is to use dplyr to change the country's name according to a rule and then aggregating results with group_by / summarise functions. Bellow you will find a small example.
countries <- data.frame(Country= c("Portugal","Poland","Israel","South Africa","Austria"),
Number = c(100000,200000,300000,400000,500000), stringsAsFactors = F)
# using dplyr
countries_dp <- countries %>%
mutate(Country = ifelse(Number > 250000, "Other", Country)) %>%
group_by(Country) %>%
summarise(Number = sum(Number))
Upvotes: 0