Reputation: 425
I'm trying to find unique instances of IP addresses in a file using regex. I find them fine and try to append them to a list and later try to use set()
on my list to remove duplicates. I'm finding each item okay and there are duplicates but I can't get the list to simplify. The output of printing my set is the same as printing ips as a list, nothing is removed.
ips = [] # make a list
count = 0
count1 = 0
for line in f: #loop through file line by line
match = re.search("\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}", line) #find IPs
if match: #if there's a match append and keep track of the total number of Ips
ips.append(match) #append to list
count = count + 1
ipset = set(ips)
print(ipset, count)
This string <_sre.SRE_Match object; span=(0, 13), match='137.43.92.119'>
shows up 60+ times in the output before and after trying to set()
the list
Upvotes: 4
Views: 2895
Reputation: 1123410
You are not storing the matched strings. You are storing the re.Match
objects. These don't compare equal even if they matched the same text, so they are all seen as unique by a set
object:
>>> import re
>>> line = '137.43.92.119\n'
>>> match1 = re.search("\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}", line)
>>> match1
<_sre.SRE_Match object; span=(0, 13), match='137.43.92.119'>
>>> match2 = re.search("\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}", line)
>>> match2
<_sre.SRE_Match object; span=(0, 13), match='137.43.92.119'>
>>> match1 == match2
False
Extract the matched text instead:
ips.append(match.group()) #append to list
matchobj.group()
without arguments returns the part of the string that was matched (group 0):
>>> match1.group()
'137.43.92.119'
>>> match1.group() == match2.group()
True
Upvotes: 12