Use sed to remove multiple lines between 2 sets of characters

Question

I'm using sed on a macOS X computer.

I have a set of very large financial 10-K files and I want to keep only the text.

Right now, I'm trying to remove all of the information between

XML

and

Usually there is a lot of information between the two but here is how a sample would look:

#Other things I want to keep
XML
10
rht-10qq3fy19_htm.xml
IDEA: XBRL DOCUMENT




#Some other text I need to keep

I've been trying to use sed without much results, I can only get it to remove single line entries like

XML SOME WORDS SOME WORDS

I used this code to get that to work:

sed -i '' s/XML.*//g' filename.txt

What should I change to get the result I want?

Once I can solve this, the other things I need to clean should also be easier. The solution doesn't have to use sed.

I'm using -i and '' at the beginning of the sed command because I'm on a Mac (BSD) and I'm modifying data in place.

frangaren · Accepted Answer

If I haven't misunderstood you, this will work for you:

sed '/XML/,//d' filename.txt

For anyone else looking for how to delete text between two patterns, use:

sed '/START_PATTERN/,/END_PATTERN/d' filename.txt

Answers (1)