Reputation: 53

Filtering Rows matching String condition in R

I`ve got some problems filtering for duplicate elements in a string. My data look similar to this:

idvisit     path
1           1,16,23,59,16
2           2,14,19,14
3           5,19,23
4           10,21
5           23,27,29,23

I have a column containing an unique ID and a column containing a path for web page navigation. The right column contains some cases, where pages were accessed twice or more often, but some different pages are between these accesses. I just want to filter() the rows, where pages occur twice or more often and at least one page is in bettween the two accesses, so the data should look like this.

idvisit     path
1           1,16,23,59,16
2           2,14,19,14
5           23,27,29,23

I just want to remove the rows that match the conditions. I really dont know how to handle a String with using a variable for the many different numbers.

Upvotes: 0

Answers (3)

akrun

Reputation: 887901

We can try

library(data.table)
lst <- strsplit(df1$path, ",")
df1[lengths(lst) != sapply(lst, uniqueN),]
#  idvisit          path
#1       1 1,16,23,59,16
#2       2    2,14,19,14
#5       5   23,27,29,23

Or an option using tidyverse

library(tidyverse)
separate_rows(df1, path) %>% 
     group_by(idvisit) %>% 
     filter(n_distinct(path) != n()) %>% 
     summarise(path = toString(path))

Upvotes: 1

Sandipan Dey

Reputation: 23129

You could try regular expressions too with grepl:

df[grepl('.*([0-9]+),.*,\\1', as.character(df$path)),]
#  idvisit          path
#1       1 1,16,23,59,16
#2       2    2,14,19,14
#5       5   23,27,29,23

Upvotes: 0

Sotos

Reputation: 51592

You can filter based on the number of elements in each string. Strings with duplicated entries will be larger than their unique lengths, i.e.

df1[sapply(strsplit(as.character(df1$path), ','), function(i) length(unique(i)) != length(i)),]
#  idvisit          path
#1       1 1,16,23,59,16
#2       2    2,14,19,14
#5       5   23,27,29,23

Upvotes: 1

Filtering Rows matching String condition in R

Answers (3)

Related Questions