Jshee
Jshee

Reputation: 2686

Find duplicates in string, and return single result for only duplicates

I've seen many examples on here, but I haven't been able to find one that fits my scenario.

I'm trying to take a string like:

string = "Hi my Name is Bill, Bill likes coding, coding is fun"

and return only, 1 value for each duplicate.

So the output would be like (ignoring punctuation):

Bill
coding

How can I accomplish this in Python3

Upvotes: 0

Views: 117

Answers (7)

Pankaj Pandey
Pankaj Pandey

Reputation: 190

use https://github.com/Alir3z4/python-stop-words

and then 
import collections
from stop_words import get_stop_words
stop_words = get_stop_words('english')
s = "Hi my Name is Bill, Bill likes coding, coding is fun"
words = s.split()
word_map = {}
for word in words:
    word = word.strip().replace(',','')
    if word not in stop_words:
       word_map[word] = word_map.get(word,0)+1
for word,count in word_map.items():
    if count>1:
       print word

Upvotes: 0

JustDucky
JustDucky

Reputation: 132

def result(x): #input should be the string
    repeated = []
    listed = x.split()
    for each in listed:
        number = listed.count(each)
        if number > 1:
            repeated.append(each)

    return set(repeated) #there can't be repeated values in a set

Upvotes: 0

Pavan
Pavan

Reputation: 668

You can try using regex to find out the proper words neglecting punctuations, try this

import re
import collections
sentence="Hi my Name is Bill, Bill likes coding, coding is fun"
wordList = re.sub("[^\w]", " ",  sentence).split()
print [item for item, count in collections.Counter(wordList).items() if count > 1]

and collections should do the trick of finding out repetitions.

Upvotes: 0

Seekheart
Seekheart

Reputation: 1173

If I got this right you want to filter out duplicates? if so you can do this.

string = "Hi my Name is Bill, Bill likes coding, coding is fun"
string = string.replace(',' , '')
string = list(set(string.split()))
string = '\n'.join(string)
print(string)

Upvotes: 0

KIDJourney
KIDJourney

Reputation: 1220

Use re to replace punctuation

import string
import re


text = "Hi my Name is Bill, Bill likes coding, coding is fun"

regex = re.compile('[%s]' % re.escape(string.punctuation))
out = regex.sub(' ', text)

Use Counter to count :

from collections import Counter

out = out.split()

counter = Counter(out)

ans = [i[0] for i in counter.items() if i[1] >1]

print(ans)

Upvotes: 0

Idos
Idos

Reputation: 15320

You can use Counter after you split your string to all the words and then print only the words that appear more than once (count > 1):

>>> import collections
>>> import re
>>> string = "Hi my Name is Bill, Bill likes coding, coding is fun"
>>> words = re.sub("[^\w]", " ",  string).split()
>>> word_counts = collections.Counter(words)
>>> for word, count in word_counts.items():
        if count > 1:
            print word

Outputs:

is
Bill
coding

Upvotes: 2

Sven Marnach
Sven Marnach

Reputation: 602205

Split your string into words. There are different ways of doing this, depending on requirements. Here's one way:

words = re.findall('\w+', string)

Count the frequency of the words:

word_counts = collections.Counter(words)

Get all the words that appear more than once:

result = [word for word in word_counts if word_counts[word] > 1]

Upvotes: 6

Related Questions