Reputation: 43
I would like to use bash on a file to extract text that lies between two strings. There are already some answers to this, eg:
Print text between two strings on the same line
But I would like to do this for multiple occurrences, sometimes on the same line, sometimes on new lines. for example, starting with a file like this:
\section{The rock outcrop pools experimental system} \label{intro:rockpools}
contain pools at their summit \parencite{brendonck_pools_2010} that have weathered into the rock over time \parencite{bayly_aquatic_2011} through chemical weathering after water collecting at the rock surface \parencite{lister_microgeomorphology_1973}.
Classification depends on dimensions \parencite{twidale_gnammas_1963}.
I would like to retrieve:
brendonck_pools_2010
bayly_aquatic_2011
lister_microgeomorphology_1973
twidale_gnammas_1963
I imagine sed should be able to do this but I'm not sure where to start.
Upvotes: 0
Views: 946
Reputation: 58351
This might work for you (GNU sed):
sed '/\\parencite{\([^}]*\)}/!d;s//\n\1\n/;s/^[^\n]*\n//;P;D' file
Delete any lines that don't contain the required string. Surround the first occurrance with newlines and remove upto and including the first newline. Print upto and including the following newline then delete what was printed and repeat.
Upvotes: 0
Reputation: 784898
Using grep -oP
;
grep -oP '\\parencite\{\K[^}]+' file
brendonck_pools_2010
bayly_aquatic_2011
lister_microgeomorphology_1973
twidale_gnammas_1963
Or using gnu-awk:
awk -v FPAT='\\\\parencite{[^}]+' '{for (i=1; i<=NF; i++) {
sub(/\\parencite{/, "", $i); print $i}}' file
brendonck_pools_2010
bayly_aquatic_2011
lister_microgeomorphology_1973
twidale_gnammas_1963
Upvotes: 1
Reputation: 67467
This two stage extract might be easier to understand, without using Perl regex.
$ grep -o "parencite{[^}]*}" cite | sed 's/parencite{//;s/}//'
brendonck_pools_2010
bayly_aquatic_2011
lister_microgeomorphology_1973
twidale_gnammas_1963
or, as always awk
to the rescue!
$ awk -F'[{}]' -v RS=" " '/parencite/{print $2}' cite
brendonck_pools_2010
bayly_aquatic_2011
lister_microgeomorphology_1973
twidale_gnammas_1963
Upvotes: 1