Reputation: 13806
I have a list of string.
A = [
'kite1.json',
'kite1.mapping.json',
'kite1.analyzer.json',
'kite2.json',
'kite3.mapping.json',
'kite3.mapping.mapping.json',
'kite3.mapping.analyzer.json',
]
I need to find common prefix which ends with all of .json
, .mapping.json
, .analyzer.json
.
Here, kite1
& kite3.mapping
are satisfied. But kite2
isn't, because it only ends with .json
.
How can I find those prefix which ends with all of .json
, .mapping.json
, .analyzer.json
.
Upvotes: 3
Views: 750
Reputation: 44376
If this were code-golf, I might win:
def ew(sx):
return set([s[:-len(sx)] for s in A if s.endswith(sx)])
ew('.analyzer.json') & ew('.mapping.json') & ew('.json')
The ew()
function loops through A
, finding all elements that end with the given suffix and stripping the suffix off, returning the results at a set.
Using it, I just calculate the intersection of the sets produced from each of the three suffixes. (&
is the operator for intersection.)
For brevity's sake, I abbreviated "ends with" to ew
and "suffix" to sx
.
The expression s[:-len(sx)]
means "the substring of s
starting at 0 and going to len(sx)
characters from the end", which has the effect of the snipping suffix off the end.
Upvotes: 3
Reputation: 223
string = "\n".join(A)
json_prefices = re.findall(r"(.*?)\.json", string)
mapping_json_prefices = re.findall(r"(.*?)\.mapping\.json", string)
analyzer_json_prefices = re.findall(r"(.*?)\.analyzer\.json", string)
result = list(set(json_prefices) & set(mapping_json_prefices)
& set(analyzer_json_prefices))
Upvotes: 0
Reputation: 73450
Use re.match
and capturing groups to extract all matches for each of your patterns. Then take the intersection of the resulting sets:
import re
s1, s2, s3 = (
set(m.group(1) for m in (re.match(pattern, s) for s in A) if m)
for pattern in (
r'^(.+)\.json$', # group(1) is the part within '()'
r'^(.+)\.mapping\.json$',
r'^(.+)\.analyzer\.json$'
)
)
result = list(s1 & s2 & s3) # intersection
# ['kite3.mapping', 'kite1']
Upvotes: 1
Reputation: 33651
Well, all you need is to collect a set of prefixes for each suffix in ['.json', '.mapping.json', '.analyzer.json']
and then just take an intersection of these sets:
In [1]: A = [
...: 'kite1.json',
...: 'kite1.mapping.json',
...: 'kite1.analyzer.json',
...: 'kite2.json',
...: 'kite3.mapping.json',
...: 'kite3.mapping.mapping.json',
...: 'kite3.mapping.analyzer.json',
...: ]
In [2]: suffixes = ['.json', '.mapping.json', '.analyzer.json']
In [3]: prefixes = {s: set() for s in suffixes}
In [4]: for word in A:
....: for suffix in suffixes:
....: if word.endswith(suffix):
....: prefixes[suffix].add(word[:-len(suffix)])
....:
In [5]: prefixes
Out[5]:
{'.analyzer.json': {'kite1', 'kite3.mapping'},
'.json': {'kite1',
'kite1.analyzer',
'kite1.mapping',
'kite2',
'kite3.mapping',
'kite3.mapping.analyzer',
'kite3.mapping.mapping'},
'.mapping.json': {'kite1', 'kite3', 'kite3.mapping'}}
In [6]: prefixes['.json'] & prefixes['.mapping.json'] & prefixes['.analyzer.json']
Out[6]: {'kite1', 'kite3.mapping'}
Upvotes: 1