Reputation: 281
So I was looking at ways to "grab" a certain part of a text file with Python, when you only know what comes before and after this particular text. I want something like this answer but for single lines. For example if I have a text file called test.txt
that looks like:
This
is
my
test
file
Then I can use
with open('test.txt') as input_data:
for line in input_data:
if line.strip() == 'is':
break
for line in input_data:
if line.strip() == 'test':
break
print(line)
...and that works fine for grabbing my
, but if my text file is a single line e.g.:
This is my test file
Then it doesn't work. I don't want to grab my
by the string index because I want something that will work only based on knowing what comes before and after that part of the line. I tried looking at a lot of questions but haven't found anything.
Thank you!
Upvotes: 0
Views: 881
Reputation: 94
It looks like you want to grab some information between "is" and "test", then a regular expression may help you, like this:
with open('test.txt') as input_data:
match = re.findall(r'\sis\s*(\w[\s\S]+?)\s*test', input_data.read())
for item in match:
print item
Upvotes: 1
Reputation: 1711
start = ' is '
end = ' test '
with open('test.txt') as input_data:
for line in input_data:
try:
start_index = line.index(start) + len(start)
end_index = line.index(end)
print line[start_index:end_index]
except ValueError:
print "not find in this line[%s]" % line.rstrip()
you can use index
to find out the start word and end word, and then get sub string
Upvotes: 1
Reputation: 1967
You can get that with a regular expression:
with open('test.txt') as input_data:
for line in input_data:
match = re.search(r' is (.*) test ', line)
if match:
print(line)
print(match.group(1))
The 3rd line looks for a pattern with "is test", if it is found, it will print first the whole line and then only the string that is between "is" and "my". I wasn't sure which one you would prefer.
Edit: changed the regex to include a space before "is" otherwise "This" would have been matched as well. Removed lookahead and lookbehind since not necessary
Upvotes: 3
Reputation: 113864
Let's consider this test file:
$ cat testfile
This
is
my
test
file
this is your test file
To get both matches:
>>> import re
>>> re.findall(r'\bis\s+(.*?)\s+test\b', open('testfile').read())
['my', 'your']
If we want to be more careful about making sure that the file is closed, we should use with
:
>>> with open('testfile') as f:
... re.findall(r'\bis\s+(.*?)\s+test\b', f.read())
...
['my', 'your']
Upvotes: 1