Jordans
Jordans

Reputation: 97

Trying to filter the top 10 entries using R?

So I have a dataset of countries GNI Per Capita and suicide numbers and I am trying to filter the top 10 countries with the highest GNI only. My problem is that in my dataset I have duplicate entries for countries for GNI as I also have suicide statistics for a particular age group per gender for every country.

I tried applying the following code using the top_n function in Dplyr :

top_highest_gni <- df_filter_ages %>%   group_by(as.numeric(as.character(GNI.per.capita..PPP..current.international.....NY.GNP.PCAP.PP.CD.))) %>% 
      top_n(10)

However this does not effect my dataset at all and no error message comes up and I am not sure why? Any help on this would be greatly appreciated!

Data looks like this :

Country   Year   Sex  GNI
Albania   2012   F    290000
Albania   2012   M    290000
UK        2012   F    2222222222
UK        2012   M    2222222222

Edit

As suggested I added the summarise function and ran this code :

df_filter_ages %>%
  group_by(country) %>% 
  summarise(max = max(as.numeric(as.character(GNI.per.capita..PPP..current.international.....NY.GNP.PCAP.PP.CD.)))) %>% 
  top_n(2)

And output is :

Selecting by max
     max
1 119330

Desired output:

Country   Year   Sex  GNI

UK        2012   F    2222222222
UK        2012   M    2222222222
Albania   2012   F    290000
Albania   2012   M    290000

Upvotes: 4

Views: 625

Answers (1)

Odysseus210
Odysseus210

Reputation: 468

Try including the summarise() function after the group_by() function and before the top_n() function.

Example:

df <- data.frame(x = c(1, 2, 3), 
                 y = c(4, 5, 6), 
                 z = c(1, 20, 50))

df %>%
  group_by(x) %>% 
  summarise(max = max(z)) %>% 
  top_n(2)

#  A tibble: 2 x 2
#       x total
#   <dbl> <dbl>
# 1     2    20
# 2     3    50

Upvotes: 2

Related Questions