Reputation: 2686
I've seen many examples on here, but I haven't been able to find one that fits my scenario.
I'm trying to take a string like:
string = "Hi my Name is Bill, Bill likes coding, coding is fun"
and return only, 1 value for each duplicate.
So the output would be like (ignoring punctuation):
Bill
coding
How can I accomplish this in Python3
Upvotes: 0
Views: 117
Reputation: 190
use https://github.com/Alir3z4/python-stop-words
and then
import collections
from stop_words import get_stop_words
stop_words = get_stop_words('english')
s = "Hi my Name is Bill, Bill likes coding, coding is fun"
words = s.split()
word_map = {}
for word in words:
word = word.strip().replace(',','')
if word not in stop_words:
word_map[word] = word_map.get(word,0)+1
for word,count in word_map.items():
if count>1:
print word
Upvotes: 0
Reputation: 132
def result(x): #input should be the string
repeated = []
listed = x.split()
for each in listed:
number = listed.count(each)
if number > 1:
repeated.append(each)
return set(repeated) #there can't be repeated values in a set
Upvotes: 0
Reputation: 668
You can try using regex to find out the proper words neglecting punctuations, try this
import re
import collections
sentence="Hi my Name is Bill, Bill likes coding, coding is fun"
wordList = re.sub("[^\w]", " ", sentence).split()
print [item for item, count in collections.Counter(wordList).items() if count > 1]
and collections should do the trick of finding out repetitions.
Upvotes: 0
Reputation: 1173
If I got this right you want to filter out duplicates? if so you can do this.
string = "Hi my Name is Bill, Bill likes coding, coding is fun"
string = string.replace(',' , '')
string = list(set(string.split()))
string = '\n'.join(string)
print(string)
Upvotes: 0
Reputation: 1220
Use re
to replace punctuation
import string
import re
text = "Hi my Name is Bill, Bill likes coding, coding is fun"
regex = re.compile('[%s]' % re.escape(string.punctuation))
out = regex.sub(' ', text)
Use Counter
to count :
from collections import Counter
out = out.split()
counter = Counter(out)
ans = [i[0] for i in counter.items() if i[1] >1]
print(ans)
Upvotes: 0
Reputation: 15320
You can use Counter
after you split your string to all the words and then print only the words that appear more than once (count > 1
):
>>> import collections
>>> import re
>>> string = "Hi my Name is Bill, Bill likes coding, coding is fun"
>>> words = re.sub("[^\w]", " ", string).split()
>>> word_counts = collections.Counter(words)
>>> for word, count in word_counts.items():
if count > 1:
print word
Outputs:
is
Bill
coding
Upvotes: 2
Reputation: 602205
Split your string into words. There are different ways of doing this, depending on requirements. Here's one way:
words = re.findall('\w+', string)
Count the frequency of the words:
word_counts = collections.Counter(words)
Get all the words that appear more than once:
result = [word for word in word_counts if word_counts[word] > 1]
Upvotes: 6