salad_bar_breath
salad_bar_breath

Reputation: 281

Python read certain text from a single line

So I was looking at ways to "grab" a certain part of a text file with Python, when you only know what comes before and after this particular text. I want something like this answer but for single lines. For example if I have a text file called test.txt that looks like:

This 
is 
my 
test 
file

Then I can use

with open('test.txt') as input_data:
    for line in input_data:
        if line.strip() == 'is': 
            break
    for line in input_data: 
        if line.strip() == 'test':
            break
        print(line) 

...and that works fine for grabbing my, but if my text file is a single line e.g.:

This is my test file

Then it doesn't work. I don't want to grab my by the string index because I want something that will work only based on knowing what comes before and after that part of the line. I tried looking at a lot of questions but haven't found anything.

Thank you!

Upvotes: 0

Views: 881

Answers (4)

Youan Wang
Youan Wang

Reputation: 94

It looks like you want to grab some information between "is" and "test", then a regular expression may help you, like this:

with open('test.txt') as input_data:
   match = re.findall(r'\sis\s*(\w[\s\S]+?)\s*test', input_data.read())
       for item in match:
           print item

Upvotes: 1

Hooting
Hooting

Reputation: 1711

start = ' is '
end = ' test '
with open('test.txt') as input_data:
    for line in input_data:
        try:
            start_index = line.index(start) + len(start)
            end_index = line.index(end)
            print line[start_index:end_index]
        except ValueError:
            print "not find in this line[%s]" % line.rstrip()

you can use index to find out the start word and end word, and then get sub string

Upvotes: 1

Alexander
Alexander

Reputation: 1967

You can get that with a regular expression:

with open('test.txt') as input_data:
    for line in input_data:
       match = re.search(r' is (.*) test ', line)
       if match:
          print(line)
          print(match.group(1))

The 3rd line looks for a pattern with "is test", if it is found, it will print first the whole line and then only the string that is between "is" and "my". I wasn't sure which one you would prefer.

Edit: changed the regex to include a space before "is" otherwise "This" would have been matched as well. Removed lookahead and lookbehind since not necessary

Upvotes: 3

John1024
John1024

Reputation: 113864

Let's consider this test file:

$ cat testfile
This                                                                                                                                                                                            
is                                                                                                                                                                                              
my                                                                                                                                                                                              
test                                                                                                                                                                                            
file                                                                                                                                                                                            
this is your test file   

To get both matches:

>>> import re
>>> re.findall(r'\bis\s+(.*?)\s+test\b', open('testfile').read())
['my', 'your']

If we want to be more careful about making sure that the file is closed, we should use with:

>>> with open('testfile') as f:
...     re.findall(r'\bis\s+(.*?)\s+test\b', f.read())
... 
['my', 'your']

Upvotes: 1

Related Questions