Bibliotango
Bibliotango

Reputation: 193

Check if string begins with one of several substrings in Python

I couldn't figure out how to perform line.startswith("substring") for a set of substrings, so I tried a few variations on the code at bottom: since I have the luxury of known 4-character beginning substrings, but I'm pretty sure I've got the syntax wrong, since this doesn't reject any lines.

(Context: my aim is to throw out header lines when reading in a file. Header lines start with a limited set of strings, but I can't just check for the substring anywhere, because a valid content line may include a keyword later in the string.)

cleanLines = []
line = "sample input here"
if not line[0:3] in ["node", "path", "Path"]:  #skip standard headers
    cleanLines.append(line)

Upvotes: 1

Views: 2989

Answers (1)

inspectorG4dget
inspectorG4dget

Reputation: 113965

Your problem stems from the fact that string slicing is exclusive of the stop index:

In [7]: line = '0123456789'

In [8]: line[0:3]
Out[8]: '012'

In [9]: line[0:4]
Out[9]: '0123'

In [10]: line[:3]
Out[10]: '012'

In [11]: line[:4]
Out[11]: '0123'

Slicing a string between i and j returns the substring starting at i, and ending at (but not including) j.

Just to make your code run faster, you might want to test membership in sets, instead of in lists:

cleanLines = []
line = "sample input here"
blacklist = set(["node", "path", "Path"])
if line[:4] not in blacklist:  #skip standard headers
    cleanLines.append(line)

Now, what you're actually doing with that code is a startswith, which is not restricted by any length parameters:

In [12]: line = '0123456789'

In [13]: line.startswith('0')
Out[13]: True

In [14]: line.startswith('0123')
Out[14]: True

In [15]: line.startswith('03')
Out[15]: False

So you could do this to exclude headers:

cleanLines = []
line = "sample input here"
headers = ["node", "path", "Path"]
if not any(line.startswith(header) for header in headers) :  #skip standard headers
    cleanLines.append(line)

Upvotes: 2

Related Questions