Aron
Aron

Reputation: 29

I would like to use regex to format text

I would very much appreciate any help to get the following thing done in notepad ++.

I have more than 40.000 lines like the ones under. All of them are English language tests. They look like this:

Q1 "I know that you don't like seafood but our friends make the best seafood Fettuccini Alfredo I have ever had.
      Will you agree to keep an ......... mind and try it before deciding you don't like it?" Jill asked her son.

  (a) open  (b) airy    (c) indifferent (d) ignorant

Q2 The boss was a scary guy. When he called you into his office, you could bet that you would receive the worse
     insults you have ever had to endure and there was something about him that would stop anyone from talking
     back to him. People immediately froze in their ......... and meekly walked into his office when he called them.

  (a) paths (b) tracks  (c) cars    (d) shoes

Q3 In ......... to change the sink, he would have to turn off the water that runs to the facet. He failed to do so and got

      a surprise when water started liberally spraying down on the kitchen floor.

   (a) ability  (b) possibility (c) plausibility    (d) order

Q4 Since the company put a set of sexual harassment rules in ......... incidents of sexual harassment were virtually

      non-existent.

   (a) ordering (b) place   (c) storage (d) foundation

Q5 "Your shed is in pretty poor ......... . The back of the foundation is sinking and there is water getting into it from

      the roof. I can't help you with the foundation but we can look for ways to seal it," Rob said to Christian.

  (a) mass  (b) density (c) support (d) shape

As you can see the questions are not in one line but they are broken with an enter plus an empty line plus and some empty space characters.

I would like to achieve something like this:

Q1 "I know that you don't like seafood but our friends make the best seafood Fettuccini Alfredo I have ever had. Will you agree to keep an ......... mind and try it before deciding you don't like it?" Jill asked her son.

  (a) open  (b) airy    (c) indifferent (d) ignorant

Q2 The boss was a scary guy. When he called you into his office, you could bet that you would receive the worse insults you have ever had to endure and there was something about him that would stop anyone from talking back to him. People immediately froze in their ......... and meekly walked into his office when he called them.

  (a) paths (b) tracks  (c) cars    (d) shoes

Q3 In ......... to change the sink, he would have to turn off the water that runs to the facet. He failed to do so and got a surprise when water started liberally spraying down on the kitchen floor.

   (a) ability  (b) possibility (c) plausibility    (d) order

Q4 Since the company put a set of sexual harassment rules in ......... incidents of sexual harassment were virtually non-existent.

   (a) ordering (b) place   (c) storage (d) foundation

Q5 "Your shed is in pretty poor ......... . The back of the foundation is sinking and there is water getting into it from the roof. I can't help you with the foundation but we can look for ways to seal it," Rob said to Christian.

  (a) mass  (b) density (c) support (d) shape

So I need all the questions in one line, one long line as long as the question until it ends. The question options must be under them as they are, I think I do not need to change the question options (a, b, c, d) only the questions.

Manually, I would have to go line by line and delete the characters until the questions are one line each. With tens of thousands of questions, it would be a difficult thing to do. Is there a way that it could be done in Notepad ++ with regex?

If it helps, each and every question starts with Q1, Q2, Q3 and so on up until Q10. All the lines that start with (a) are question options.

Upvotes: 2

Views: 59

Answers (3)

Allan
Allan

Reputation: 12438

You can use the following regex:

Q(.*)(\r?\n)+\h+(\w)

and replacement:

Q\1 \3

or

Q(.+)\v+\h+(\w)

and replacement:

Q\1 \2

Click on Replace All a couple of times and it will be done.

EXPLANATIONS:

  • Q(.+)\v+\h+(\w) will select all lines starting with Q followed by one or several characters and ending by one or several EOL/Carriage return char themselves followed by several horizontal space characters then followed by a word char to avoid taking the answers into account.
  • Then you replace the whole thing by Q\1 \2: the Q the first line of the question a space and the second line of the question (by using backreferences)

You need to click several times on replace all until no occurence is replaced.

Let me know if anything is unclear.

TESTED:

enter image description here

Upvotes: 1

Sebastian Proske
Sebastian Proske

Reputation: 8413

Two approaches:

Based on the fact, that the start of the lines you want attach is always indented, you can use

\R++\h++([^(])

and replace with $1.

Or based on the fact that you don't want to merge lines starting with an opening bracket or Q number, you can use

\R++\h*+((?!Q\d)[^(])  

and again replace with $1.

Upvotes: 2

Jokab
Jokab

Reputation: 2936

Try this with Find+Replace:

(\n\s+)(\s\w)

Replace with:

$2

Upvotes: 0

Related Questions