Reputation: 561
I am reading a .csv
file and saving it to a matrix called csvfile
, and the matrix contents look like this (abbreviated, there are dozens of records):
[['411-440854-0', '411-440824-0', '411-441232-0', '394-529791', '394-529729', '394-530626'], <...>, ['394-1022430-0', '394-1022431-0', '394-1022432-0', '***another CN with a switch in between'], ['394-833938-0', '394-833939-0', '394-833940-0'], <...>, ['394-1021830-0', '394-1021831-0', '394-1021832-0', '***Sectionalizer end connections'], ['394-1022736-0', '394-1022737-0', '394-1022738-0'], <...>, ['394-1986420-0', '394-1986419-0', '394-1986416-0', '***weird BN line check'], ['394-1986411-0', '394-1986415-0', '394-1986413-0'], <...>, ['394-529865-0', '394-529686-0', '394-530875-0', '***Sectionalizer end connections'], ['394-830900-0', '394-830904-0', '394-830902-0'], ['394-2350772-0', '394-2350776-0', '394-2350774-0', '***Sectionalizer present but no end break'], <...>]
and I am reading a text file into a variable called textfile
and the content looks like this:
...
object underground_line {
name SPU123-394-1021830-0-sectionalizer;
phases AN;
from SPU123-391-670003;
to SPU123-395-899674_sectionalizernode;
length 26.536;
configuration SPU123-1/0CN15-AN;
}
object underground_line {
name SPU123-394-1021831-0-sectionalizer;
phases BN;
from SPU123-391-670002;
to SPU123-395-899675_sectionalizernode;
length 17.902;
configuration SPU123-1/0CN15-BN;
}
object underground_line {
name SPU123-394-1028883-0-sectionalizer;
phases CN;
from SPU123-391-542651;
to SPU123-395-907325_sectionalizernode;
length 771.777;
configuration SPU123-1CN15-CN;
}
...
I want to see if a portion of name
line in textfile
(anything after SPU123-
and before -0-sectionalizer
) exists in csvfile
matrix. If it does not exist, I want to do something (increment a counter) and I tried several ways including below:
counter = 0
for noline in textfile:
if 'name SPU123-' in noline:
if '-' in noline[23]:
if ((noline[13:23] not in s[0]) and (noline[13:23] not in s[1]) and (noline[13:23] not in s[2]) for s in csvfile):
counter = counter+1
else:
if ((noline[13:24] not in s[0]) and (noline[13:24] not in s[1]) and (noline[13:-24] not in s[2]) for s in csvfile):
counter = counter+1
print counter
This is not working. I also tried with if any((noline......)
in the above code sample and it doesn't work either.
Upvotes: 0
Views: 76
Reputation: 23753
import re, itertools
Flatten csvfile
-- data
is an iterator
data = itertools.chain.from_iterable(csvfile)
Extract relevant items from data and make it a set for performance (avoid iterating over data multiple times)
data_rex = re.compile(r'\d{3}-\d+')
data = {match.group() for match in itertools.imap(data_rex.match, data) if match}
Quantify the the names that are not in data.
def predicate(match, data = data):
'''Return True if match not found in data'''
return match.group(1) not in data
# after SPU123- and before -0-
name = re.compile(r'name SPU123-(\d{3}-\d+)-')
names = name.finditer(textfile)
# quantify
print sum(itertools.imap(predicate, names))
Upvotes: 0
Reputation: 36026
Since your matrix includes loads upon loads of values, it's very slow to iterate over it all each time.
Assemble your values into a mapping instead (a set
in this case since there are no associated data) since hash table lookups are very fast:
s = {v for r in matrix for v in r if re.match(r'\d[-\d]+]\d$',v)} #or any filter more appropriate for your notion of valid identifiers
if noline[13:23] in s: #parsing the identifiers instead would be more fault-tolerant
#do something
Due to the preliminary step, this will only start outperforming the brute-force approach beyond a certain scale.
Upvotes: 1
Reputation: 5289
Checking for a string s
in a list of lists l
:
>>> l = [['str', 'foo'], ['bar', 'so']]
>>> s = 'foo'
>>> any(s in x for x in l)
True
>>> s = 'nope'
>>> any(s in x for x in l)
False
Implementing this into your code (assuming that noline[13:23]
is the string your are wanting search for, and then increment counter
if it is not in csvfile
):
counter = 0
for noline in textfile:
if 'name SPU123-' in noline:
if '-' in noline[23]: noline[13:23]:
if not any(noline[13:23] in x for x in csvfile) and not any(noline[13:23] + '-0' in x for x in csvfile):
counter += 1
else:
if not any(noline[13:24] in x for x in csvfile) and not any(noline[13:24] + '-0' in x for x in csvfile):
counter += 1
Upvotes: 1