Reputation: 128
I have a list of words such as:
l = """abc
dfg
hij
jih
gfd
cba
cbd
jip
gfe
jiw
cbw"""
I want to find pairs of words out of this list, so the first word is:
.(.)(.)
And the second word is:
\2\1.
So the \1 and the \2 refer to the characters in the first word.
The best regexp that I could come up with is:
re.findall('(^.(?P<A>.)(?P<B>.)$)(?=.*(^(?P=B)(?P=A).$))', l, re.DOTALL | re.MULTILINE)
But this search returns only some of the pairs (because findall returns only non-overlapping results...). Then I thought of using positive lookbehind assertions, but they can only be used with strings of fixed length...
Is there a way to so this with regex?
Upvotes: 4
Views: 1570
Reputation: 95318
I doubt that regular expressions are a good way to do this (especially in Python, were you can't simply get all possible ways of matching a string like in Perl, so you'd have to call findall
on all prefixes of your string). A straightforward alternative would be:
words = l.split()
pairs = set(frozenset((w1, w2)) for w1 in words for w2 in words
if w1[1:] == w2[1::-1])
Results in:
>>> map(tuple, pairs)
[('hij', 'jip'),
('abc', 'cbd'),
('dfg', 'gfd'),
('dfg', 'gfe'),
('jiw', 'hij'),
('hij', 'jih'),
('abc', 'cbw'),
('abc', 'cba')]
You could also solve this really fast by saving the prefixes of the words in a dictionary in a first pass and then build up the associations in a second pass:
from collections import defaultdict
prefixes = defaultdict(list)
for w in words:
prefixes[w[1::-1]].append(w)
pairs = set(frozenset((w1, w2)) for w1 in words for w2 in prefixes[w1[1:]])
The performance of this will be hard to beat by a regular expression engine.
Upvotes: 2