SusanBlackmore13
SusanBlackmore13

Reputation: 25

Conditional distinct rows in R

I have a dataframe in which I want to keep all the distinct string entries (i.e. get rid of duplicates) in one column unless these entries are short, str_length < 7. I also want to keep all the other columns.

So I have

string other columns
"abc"
"abc"
"centauri"
"centauri"
"armageddon"
"armageddon"
"spaghetti"

Desired output:

string other columns
"abc"
"abc"
"centauri"
"armageddon"
"spaghetti"

I have tried a variety of dplyr approaches, but nothing works.

df <- df %>%
  mutate(len = str_length(string))%>%
  group_by(string, len) %>%
  filter(len >7) %>%
  distinct(.keep_all = TRUE) 

In this example, I am not getting the rows back which I filtered out. But I just want to protect the filtered rows from the distinct function and then get them back into the dataframe.

Upvotes: 1

Views: 1069

Answers (1)

akrun
akrun

Reputation: 887951

We can use duplicated with nchar

df1[!(duplicated(df1$string) & nchar(df1$string) > 7), , drop = FALSE]

-output

#     string
#1        abc
#2        abc
#3   centauri
#5 armageddon
#7  spaghetti

Or with filter in dplyr

library(dplyr)
df1 %>%
   filter(!(duplicated(string) & nchar(string) > 7))

data

df1 <- structure(list(string = c("abc", "abc", "centauri", "centauri", 
"armageddon", "armageddon", "spaghetti")), class = "data.frame", 
row.names = c(NA, 
-7L))

Upvotes: 2

Related Questions