Reputation: 31
I have looked at several of the answers related to this question, however, I still can not get exactly what I need to work.
Essentially, I am writing a paper that uses bibliographic codes in a mark-up container. This may occur in several ways, e.g.:
\citet{bibcode}
\citep{bibcode}
\citet{bibcode1,bibcode2}
\citep[randomtext]{bibcode}
etc.
I am trying to compile a list of purely these bibcodes. Where there are multiple bibcodes in a single container, they are separated by a comma.
Currently, I am using:
sed -n 's:.*\cite.*{\(.*\)}.*:\1:p' sample.tex
It works for some instances, though, not with all. It appears to still get distracted by other uses of the curly brackets, and picks up a lot of unnecessary text.
Any help regarding this matter would be highly appreciated.
Thank you in advance.
Upvotes: 1
Views: 32
Reputation: 204731
This will work for the sample input you gave:
$ cat tst.awk
BEGIN { FS="[{},]" }
/\\cite/ {
for (i=2;i<NF;i++) {
if (!seen[$i]++) {
print $i
}
}
}
$ awk -f tst.awk file
bibcode
bibcode1
bibcode2
If your real input is more complicated/difficult to parse than that then update your question to show some input that more accurately demonstrates your problem and the associated output you are looking for.
Upvotes: 0
Reputation:
Assuming there is no more than one citation on each line, you could adjust your regex to be something like this:
s:.*\\cite[^{]*{\([^}]*\)}.*:\1:p
Upvotes: 1