Ashish Kapil
Ashish Kapil

Reputation: 100

How can I find the same/duplicate elements(with more than 1 word) in the list?

For example I have a list:

lst = ["abc bca","bca abc","cde def"]

I want to consider the elements "abc bca" and "bca abc" same/duplicate, what should be the approach?

Upvotes: 0

Views: 85

Answers (4)

alex
alex

Reputation: 7471

I'm not sure what you mean exactly by "I want to consider the elements the same", but you could use this approach if you wanted to return a set of "unique" items:

original_list = ["abc bca", "bca abc", "cde def"]
modified_list = []

for original_one_item in original_list:
    original_one_items = original_one_item.split(' ')
    original_one_items.sort()
    modified_list.append(" ".join(original_one_items))

modified_list = set(modified_list)

This will remove the "bca abc" item from the first list and return a set.

Upvotes: 0

Cory Kramer
Cory Kramer

Reputation: 118031

>>> [' '.join(j) for j in set(tuple(sorted(i.split())) for i in lst)]
['abc bca', 'cde def']

The way this works is by first spliting the strings on whitespace

>>> [i.split() for i in lst]
[['abc', 'bca'], ['bca', 'abc'], ['cde', 'def']]

Then sort each sublist

>>> [tuple(sorted(i.split())) for i in lst]
[('abc', 'bca'), ('abc', 'bca'), ('cde', 'def')]

Lastly you can create a set since we converted to tuple which is hashable (whereas list is not).

>>> set(tuple(sorted(i.split())) for i in lst)
{('abc', 'bca'), ('cde', 'def')}

The outermost list comprehension simply uses join to recreate the whitespace-joined original strings.

Upvotes: 3

Adam
Adam

Reputation: 2361

You can change yours strings to set of words:

>>> lst = ["abc bca","bca abc","cde def"]
>>> new_lst = [frozenset(x.split(' ')) for x in lst]

And then you can use just some method of finding duplicates in the list:

>>> print [item for item, count in collections.Counter(new_lst).items() if count > 1]
[frozenset(['abc', 'bca'])]
>>>

Upvotes: 0

harshil9968
harshil9968

Reputation: 3244

>>> from collections import Counter
>>> lst = ["abc bca","bca abc","cde def"]
>>> c = Counter(lst)
>>> c
Counter({'abc bca': 1, 'cde def': 1, 'bca abc': 1})
>>> for i in c:
...     if c[i]>1:
...             print i
... 
>>> lst = ["abc","bca","bca","abc","cde","def"]
>>> c = Counter(lst)
>>> for i in c:
...     if c[i]>1:
...             print i
... 
abc
bca
>>> 

Upvotes: 1

Related Questions