Reputation: 4742
Is there a way to convert an img
tag containing an alt
attribute (in a html file),
<img src="pics/01.png" alt="my very first pic"/>
to an image link plus caption (org file),
#+CAPTION: my very first pic
[[pics/01.png]]
using pandoc?
I'm calling pandoc
like this:
$ pandoc -s -r html index.html -o index.org
where index.html
contains the img
tag from above, but it doesn't add the caption in the output org file:
[[pics/01.png]]
Upvotes: 0
Views: 73
Reputation: 4742
OP here. I didn't manage to make pandoc bend to my needs in this case. But a little bash scripting with some awk help does the trick. The script replaces all img tags with org-mode equivalents plus captions. Pandoc leaves these alone when converting from html to org-mode.
The awk script,
# replace_img.awk
#
# Sample input:
# <img src="/pics/01.png" alt="my very first pic"/>
# Sample output:
# #+CAPTION: my very first pic
# [[/pics/01.png]]
BEGIN {
# Split the input at "
FS = "\""
}
# Replace all img tags with an org-mode equivalent.
/^<img src/{
print "#+CAPTION: " $4
print "[["$2"]]"
}
# Leave the rest of the file intact.
!/^<img src/
and the bash script,
# replace_img.sh
php_files=`find -name "*.php"`
for file in $php_files; do
awk -f replace_img.awk $file > tmp && mv tmp $file
done
Place these files at the root of the project, chomod +x replace_img.sh
and then run the script: ./replace_img.sh
. Change the extension of the files, if needed. I've had over 300 php files.
Upvotes: 0
Reputation: 39413
Currently the Org Writer unfortunately throws away the image alt
and title
strings. Feel free to submit an issue or patch if there's a way to do alt
text in Org.
You can also always write a filter to modify the doc AST and add the alt text to an additional paragraph.
Upvotes: 1