Reputation: 97
So I have a dataset of countries GNI Per Capita and suicide numbers and I am trying to filter the top 10 countries with the highest GNI only. My problem is that in my dataset I have duplicate entries for countries for GNI as I also have suicide statistics for a particular age group per gender for every country.
I tried applying the following code using the top_n function in Dplyr :
top_highest_gni <- df_filter_ages %>% group_by(as.numeric(as.character(GNI.per.capita..PPP..current.international.....NY.GNP.PCAP.PP.CD.))) %>%
top_n(10)
However this does not effect my dataset at all and no error message comes up and I am not sure why? Any help on this would be greatly appreciated!
Data looks like this :
Country Year Sex GNI
Albania 2012 F 290000
Albania 2012 M 290000
UK 2012 F 2222222222
UK 2012 M 2222222222
Edit
As suggested I added the summarise function and ran this code :
df_filter_ages %>%
group_by(country) %>%
summarise(max = max(as.numeric(as.character(GNI.per.capita..PPP..current.international.....NY.GNP.PCAP.PP.CD.)))) %>%
top_n(2)
And output is :
Selecting by max
max
1 119330
Desired output:
Country Year Sex GNI
UK 2012 F 2222222222
UK 2012 M 2222222222
Albania 2012 F 290000
Albania 2012 M 290000
Upvotes: 4
Views: 625
Reputation: 468
Try including the summarise()
function after the group_by()
function and before the top_n()
function.
Example:
df <- data.frame(x = c(1, 2, 3),
y = c(4, 5, 6),
z = c(1, 20, 50))
df %>%
group_by(x) %>%
summarise(max = max(z)) %>%
top_n(2)
# A tibble: 2 x 2
# x total
# <dbl> <dbl>
# 1 2 20
# 2 3 50
Upvotes: 2