Destry3
Destry3

Reputation: 43

How to remove duplicate rows in R?

I have the following data frame with me in R (for anyone familiar with tidyverse, it's the starwars sample dataset)enter image description here

I'm trying to create a tibble that outputs two columns: homeworld, and shortest_5 (average height of shortest 5 people from that homeworld).

Below is my code;

df<-starwars %>%
  group_by(homeworld) %>%
  filter(!is.na(height), !is.na(homeworld)) %>%
  arrange(desc(height)) %>%
  mutate(last5mean = mean(tail(height, 5))) %>%
  summarize(shortest_5=last5mean, number=n()) %>%
  filter(number>=5, ) 
df

It seems that I've successfully done so (though it is quite messy). My problem is that though my tibble does list homeworld and shortest_5, it repeats multiple instances of the same homeworld.

enter image description here

Seems like a simple fix but I can't quite wrap my head around it! Any help would be really appreciated!

Upvotes: 2

Views: 1973

Answers (2)

Andy
Andy

Reputation: 475

You can get rid of duplicate data using the duplicate() function

For Example

df <- c(1,1,2,3,4,4,5,6,10,10,10)

Check out which data are duplicated

df[duplicated(df)] # notice it shows 1, 4, and 10 (Note: need to add a comma if your df has more than one column, such as here: New_DF <- df[!duplicated(df),]

Remove duplicates

New_DF <- df[!duplicated(df)] # all duplicate data removed

Upvotes: 3

deschen
deschen

Reputation: 10996

You can considerably shorten your code:

df<-starwars %>%
  group_by(homeworld) %>%
  filter(!is.na(height), !is.na(homeworld), n() >=5) %>%
  summarize(shortest_5 = mean(if_else(rank(height) > 5, NA_integer_, height), na.rm = TRUE))

df

# # A tibble: 2 x 2
#   homeworld shortest_5
#   <chr>          <dbl>
# 1 Naboo           151.
# 2 Tatooine        153.

Note:

  • I get different results than you, e.g. on Naboo the shortest 5 characters have height: 96, 157, 165, 165, 170. And the mean of these 5 values is 150.6.
  • You shouldn't have values for e.g. Coruscant, since there are only 3 characters from that homeworld. The only two homeworlds with at least 5 characters are Naboo and Tatooine.

Upvotes: 2

Related Questions