Reputation: 167
I have a text file ;
... Above in Table 5 , we understood the relationship between pressure and volume. It said ... and now we know ... . Table 9: represents the graph of x and y. Table 6 was all about force and it implications on objects....
Now I have written a code to extract the lines that have the word table in it;
with open file( <pathname + filename.txt>, 'r+') as f:
k = f.readlines()
for line in k:
if ' Table ' in line:
print(line)
Now I desire to print the output in a particular format;
(txt file name),(Table id),(Table content)
I do this by using the .split method of python;
x = 'Paper ID:' + filename.split('.')[0] + '|' + 'Table ID:' + line.split(':')[0] + '|' + 'Table Content:' + line.split(':')[1] + '|'
Now,as you can see, I can separate the table id and table content where there is a delimiter ( :) after some . How do I do the same where there is no delimiter, i.e. for these lines;
Above in Table 5 , we understood the relationship between pressure and volume. It said ... and now we know .. Or In table 7 we saw....
?
Could anyone please help?
Upvotes: 1
Views: 49
Reputation: 6516
You could search for the pattern Table <number>
then split at that location.
You could use re.split(pattern, string, maxsplit=0, flags=0)
or re.findall(pattern, string, flags=0)
re.split('Table [0-9]', line)[-1]
will give you what follows (the content).
re.findall('Table [0-9]', line)
will give you the table with its ID from which you can extract it.
Python documentation on re.split and re.findall
Upvotes: 1