Reputation: 125
Given a text file (.tex) which may contain strings of the form "\cite{alice}", "\cite{bob}", and so on, I would like to write a bash script that stores the content within brackets of each such string ("alice" and "bob") in a new text file (say, .txt). In the output file I would like to have one line for each such content, and I would also like to avoid repetitions.
Attempts:
Upvotes: 0
Views: 83
Reputation: 189337
You can use grep -o
and postprocess its output:
grep -o '\\cite{[^{}]*}' file.tex |
sed 's/\\cite{\([^{}]*\)}/\1/'
If there can only ever be a single \cite
on an input line, just a sed
script suffices.
sed -n 's/.*\\cite{\([^{}]*\)}.*/\1/p' file.tex
(It's by no means impossible to refactor this into a script which extracts multiple occurrences per line; but good luck understanding your code six weeks from now.)
As usual, add sort -u
to remove any repetitions.
Here's a brief Awk attempt:
awk -v RS='\' '/^cite\{/ {
split($0, g, /[{}]/)
cite[g[2]]++ }
END { for (cit in cite) print cit }' file.tex
This conveniently does not print any duplicates, and trivially handles multiple citations per line.
Upvotes: 2
Reputation: 6374
What about:
grep -oP '(?<=\\cite{)[^}]+(?=})' sample.tex | sort -u > cites.txt
-P
with GNU grep
interprets the regexp as a Perl-compatible one (for lookbehind and lookahead groups)-o
"prints only the matched (non-empty) parts of a matching line, with each such part on a separate output line" (see manual)\cite{
(positive lookbehind group (?<=\\cite{)
) and followed by a right curly brace (positive lookafter group (?=})
).sort -u
sorts and remove duplicatesFor more details about lookahead and lookbehind groups, see Regular-Expressions.info dedicated page.
Upvotes: 2