user1886912
user1886912

Reputation: 31

Retrieving text between two terms

I have looked at several of the answers related to this question, however, I still can not get exactly what I need to work.

Essentially, I am writing a paper that uses bibliographic codes in a mark-up container. This may occur in several ways, e.g.:

\citet{bibcode}
\citep{bibcode}
\citet{bibcode1,bibcode2}
\citep[randomtext]{bibcode}

etc.

I am trying to compile a list of purely these bibcodes. Where there are multiple bibcodes in a single container, they are separated by a comma.

Currently, I am using:

sed -n 's:.*\cite.*{\(.*\)}.*:\1:p' sample.tex

It works for some instances, though, not with all. It appears to still get distracted by other uses of the curly brackets, and picks up a lot of unnecessary text.

Any help regarding this matter would be highly appreciated.

Thank you in advance.

Upvotes: 1

Views: 32

Answers (2)

Ed Morton
Ed Morton

Reputation: 204731

This will work for the sample input you gave:

$ cat tst.awk
BEGIN { FS="[{},]" }
/\\cite/ {
    for (i=2;i<NF;i++) {
        if (!seen[$i]++) {
            print $i
        }
    }
}

$ awk -f tst.awk file
bibcode
bibcode1
bibcode2

If your real input is more complicated/difficult to parse than that then update your question to show some input that more accurately demonstrates your problem and the associated output you are looking for.

Upvotes: 0

user1902824
user1902824

Reputation:

Assuming there is no more than one citation on each line, you could adjust your regex to be something like this:

s:.*\\cite[^{]*{\([^}]*\)}.*:\1:p

Upvotes: 1

Related Questions