David
David

Reputation: 524

Capturing all the text until the next occurrence including the line breaks in regex (Python 3.x)

I have a .tex file and I am looking to prepare standalone documents from that for each question. In nutshell, I need the question and their choices in the following format at the end.

[
    {'question': 
        {'text' : 'If $\vec A = t^2 \vec i -t \vec j + (2t +1)\vec k$, what is  $\dfrac{dA}{dt}$?',
        'marks': 2
        },
     'choices': [{'text': '$2t\vec i -  \vec j + 2\vec k$', 'is_correct': True }, {'text': '$2t\vec i +  \vec j + 2\vec k$', 'is_correct': False, ...}]
    },
...]

I using python 3.x to write regular expressions to search and do the job. The following is a minimal example

 \documentclass{exam} 
  
  
  \begin{questions}
  
  \question[2] If $\vec A = t^2 \vec i -t \vec j + (2t +1)\vec k$, what is  $\dfrac{dA}{dt}$?
  \begin{multicols}{2}
    \begin{choices} 
    \correctchoice   $2t\vec i -  \vec j + 2\vec k$
  
    \choice  $2t\vec i +  \vec j + 2\vec k$
  
    \choice  $t\vec i -  \vec j + 2\vec k$
  
    \choice  None of these
   
    \end{choices}
  \end{multicols}
      
  \question[2] Is $\nabla \phi $ is perpendicular
   to the surface $\phi(x,y,z) = c$ where $c$ is a constant?
  \begin{multicols}{2}
    \begin{choices} 
    \choice  Not all the times
  
    \correctchoice   Yes, Always
  
    \choice  No, Never
  
    \choice  None of these
   
    \end{choices}
  \end{multicols}
      
  
      
  \end{questions}
  \end{document}
 

I am struct at writing regular expressions and splitting the questions separately with its choices. I have tried the following:

import re

with open('qpaper.tex', 'r') as f:
    content = f.read()
    regex = re.compile(r"\\question.*")
    qns = regex.findall(content)
    for q in qns:
        print(q)

This gives only the text after \question but until the linebreak. In my case, I needed it until the next occurrence of the \question.

Note: In my case, all my question papers will have the exact same format, and all the questions will be insdide \begin{questions}...\end{questions}, if there is any other better way to achieve this, that would also be helpful.

Upvotes: 1

Views: 48

Answers (1)

Ryszard Czech
Ryszard Czech

Reputation: 18621

Use

(?s)\\question.*?(?=\\question|\Z)

See proof

This expression extracts substrings starting with \question up to the next closest \question or end of string (\Z).

EXPLANATION:

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?s)                     set flags for this block (with . matching
                           \n) (case-sensitive) (with ^ and $
                           matching normally) (matching whitespace
                           and # normally)
--------------------------------------------------------------------------------
  \\                       '\'
--------------------------------------------------------------------------------
  question                 'question'
--------------------------------------------------------------------------------
  .*?                      any character (0 or more times (matching
                           the least amount possible))
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    \\                       '\'
--------------------------------------------------------------------------------
    question                 'question'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    \Z                       the end of the string
--------------------------------------------------------------------------------
  )                        end of look-ahead

Upvotes: 1

Related Questions