Sebastian Ettner
Sebastian Ettner

Reputation: 53

Filtering Rows matching String condition in R

I`ve got some problems filtering for duplicate elements in a string. My data look similar to this:

idvisit     path
1           1,16,23,59,16
2           2,14,19,14
3           5,19,23
4           10,21
5           23,27,29,23

I have a column containing an unique ID and a column containing a path for web page navigation. The right column contains some cases, where pages were accessed twice or more often, but some different pages are between these accesses. I just want to filter() the rows, where pages occur twice or more often and at least one page is in bettween the two accesses, so the data should look like this.

idvisit     path
1           1,16,23,59,16
2           2,14,19,14
5           23,27,29,23

I just want to remove the rows that match the conditions. I really dont know how to handle a String with using a variable for the many different numbers.

Upvotes: 0

Views: 145

Answers (3)

akrun
akrun

Reputation: 886938

We can try

library(data.table)
lst <- strsplit(df1$path, ",")
df1[lengths(lst) != sapply(lst, uniqueN),]
#  idvisit          path
#1       1 1,16,23,59,16
#2       2    2,14,19,14
#5       5   23,27,29,23

Or an option using tidyverse

library(tidyverse)
separate_rows(df1, path) %>% 
     group_by(idvisit) %>% 
     filter(n_distinct(path) != n()) %>% 
     summarise(path = toString(path))

Upvotes: 1

Sandipan Dey
Sandipan Dey

Reputation: 23101

You could try regular expressions too with grepl:

df[grepl('.*([0-9]+),.*,\\1', as.character(df$path)),]
#  idvisit          path
#1       1 1,16,23,59,16
#2       2    2,14,19,14
#5       5   23,27,29,23

Upvotes: 0

Sotos
Sotos

Reputation: 51582

You can filter based on the number of elements in each string. Strings with duplicated entries will be larger than their unique lengths, i.e.

df1[sapply(strsplit(as.character(df1$path), ','), function(i) length(unique(i)) != length(i)),]
#  idvisit          path
#1       1 1,16,23,59,16
#2       2    2,14,19,14
#5       5   23,27,29,23

Upvotes: 1

Related Questions