Textpipe: extracting text between two tags

Question

I can't for the life of me figure out how to accomplish this task with TextPipe.

TASK:

Extract (cut out) this TEXT including the start and end tag and get a file containing only these tags and the text in between.

`TEXT`

I defined a restriction filter with an end and start tag, but what's next? This filter demands a subfilter and I don't understand what exact filter I need to use next and how to customize it. I need to repeat this extraction process for several thousands of HTML files.

Steps specifically for TextPipe will be greatly appreaciated, as I'm not much a of a programmer myself.

Borodin · Accepted Answer

Without any further help from yourself, I can only guess that you want to remove all

elements whose first child is another

element with a class attribute equal to "article".

After a quick look at the TextPipe documentation it looks like it won't do anything like XPath expressions, but you should experiment with a Restrict to between tags filter and a Remove All subfilter.

Bear in mind that it is possible that TextPipe won't do what you want and you may have to look elsewhere for a solution.

Textpipe: extracting text between two tags

Answers (2)

Related Questions