Reputation: 105
I am trying to make a regex pattern to grab part of a string, the file contains certain headers, and all of the headers have the same format. I'm currently using python, and would like to keep it that way.
Here is an example file that I came across:
TI TEST TEST TEST TEST TEST TEST TEST TEST AJSAOISJAO SOAI
ASASPAOS
SO EITCHA EITCHA EITCHA EITCHA EITCHA EITCHA EITCHA EITCHA
AB Purpose
To examine the evidence supporting the use of simulation-based assessments as surrogates for patient-related outcomes assessed in the workplace.
Method
The authors systematically searched MEDLINE, EMBASE, Scopus, and key journals through February 26, 2013. They included original studies that assessed health professionals and trainees using simulation and then linked those scores with patient-related outcomes assessed in the workplace. Two reviewers independently extracted information on participants, tasks, validity evidence, study quality, patent-related and simulation-based outcomes, and magnitude of correlation. All correlations were pooled using random-effects meta-analysis.
Results
Of 11,628 potentially relevant articles, the 33 included studies enrolled 1,203 participants, including postgraduate physicians (n = 24 studies), practicing physicians (n = 8), medical students (n = 6), dentists (n = 2), and nurses (n = 1). The pooled correlation for provider behaviors was 0.51 (95% confidence interval [Cl], 0.38 to 0.62; n = 27 studies); for time behaviors, 0.44 (95% Cl, 0.15 to 0.66; n = 7); and for patient outcomes, 0.24(95% Cl, 0.02 to 0.47; n = 5). Most reported validity evidence was favorable, though studies often included only correlational evidence. Validity evidence of internal structure (n = 13 studies), content (n = 12), response process (n = 2), and consequences (n = 1) were reported less often. Three tools showed large pooled correlations and favorable (albeit incomplete) validity evidence.
Conclusions
Simulation-based assessments often correlate positively with patient-related outcomes. Although these surrogates are imperfect, tools with established validity evidence may replace workplace-based assessments for evaluating select procedural skills.
OI MANEIRAO MANEIRAOMANEIRAOMANEIRAO MANEIRAO
SN 6516516516
EI 849819981981
PD FEB
PY 2015
My current objective is to capture the entire text of the 'AB' header. It is good to note that the length and format of the contents of AB doesn't change that much, its prety much always paragraphs, or a line of text until the next header.
I've tried a bunch of different regexes patterns, the one that got me closer to what I want is:
\nAB ((.*?\n)+)(\n[A-Z]{2}\s)?
However it goes until the end of the file consuming every header it finds, I would like for the pattern to stop matching after encountering the next header after AB, whatever it may be.
The headers follow a pattern of always a line break, after that two uppercase letters and a space, or:
\n[A-Z]{2}\s
Thanks to whomever helps in any way.
My question is different of the normal greedy signs because it is not ordered by a character being not greedy and yet an entire "stop" group.
Upvotes: 0
Views: 67