Regex to match all text on multiple lines unless it contains a specific string?

Question

I know this question has been asked before, but none of the previous responses have worked for me. I have a PDF that I'm trying to convert in Calibre. In the conversion process, I want to get rid of the headers and footers, which look like these:


viii

Introduction



Introduction

ix

I used the following regex, which worked beautifully to select all of these instances:

(?s)
(.*?)(.*?)
(.*?)

HOWEVER, when there is a chapter title, the PDF code says this:


INTRODUCTION

which is also picked up by my regex. I want to alter my code to ignore the chapter titles. I have tried dozens of combinations replacing the

(.*?)

with things like

[^index] 
^((?!index).)*$ 
/(?s)^((?!index).)*$/

I have also tried each of these with href, =, and " instead of "index," but none of these codes pick up anything. Any ideas what I need to change in my code so I can remove the headers and footers without removing the chapter titles? Thank you in advance!

Regex to match all text on multiple lines unless it contains a specific string?

Answers (1)

Related Questions