Reputation: 2683
I'm trying to use Python to extract text between the below headers:
@HEADER1
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
@othertext
The exact text of @HEADER1
+ @othertext
might change over time. So I need to to be dynamic.
Also, HEADER2
is a word that starts with an '@'
. So is there a startswith
function I can use? Or a regular expression?
Something like.
For line in file:
if(line == 'HEADER1'):
print next line
continue = TRUE
if(continue == TRUE):
print(line)
elif(line == othertext):
break
Upvotes: 0
Views: 2466
Reputation: 99
I use in such occasions partition() method
text_to_extract = "@HEADER1\nExtractMe\nExtractMe\nExtractMe\nExtractMe\nExtractMe\nExtractMe\nExtractMe\nExtractMe\nExtractMe\n@othertext"
extracted = text_to_extract.partition('@HEADER1')[2].partition('@othertext')[0]
print (extracted)
Output:
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
Upvotes: 0
Reputation: 10403
This does the job
import re
string = """@HEADER1
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
@othertext"""
print '"{}"'.format(re.split(r'(@HEADER1[\n\r]|[\n\r]@othertext)', string)[2])
output:
"ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe"
Upvotes: 5
Reputation: 17054
Looking something like this?
import re
string = """@HEADER1
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
@othertext
@HEADER2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
@othertext"""
for a in re.findall(r'@\w+(?:\r\n|\r|\n)(.*?)@\w+(?:\r\n|\r|\n)?', string, re.DOTALL):
print a
Output:
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
Upvotes: 2
Reputation: 1847
Without re
string = """@HEADER1
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
@othertext"""
You can play around with str.find
inside a string splice. Like so:
print(string[string.find("\n"):string.find("\n@")])
Or you can turn the string into a list, get the elements you want and join it back together like so...
list = string.split("\n")
list = list[1:len(list)-1]
print("\n".join(list))
Upvotes: 0