mihai
mihai

Reputation: 4742

How to generate caption from img alt atribute

Is there a way to convert an img tag containing an alt attribute (in a html file),

<img src="pics/01.png" alt="my very first pic"/>

to an image link plus caption (org file),

#+CAPTION: my very first pic
[[pics/01.png]]

using pandoc?

I'm calling pandoc like this:

$ pandoc -s -r html index.html -o index.org

where index.html contains the img tag from above, but it doesn't add the caption in the output org file:

[[pics/01.png]]

Upvotes: 0

Views: 73

Answers (2)

mihai
mihai

Reputation: 4742

OP here. I didn't manage to make pandoc bend to my needs in this case. But a little bash scripting with some awk help does the trick. The script replaces all img tags with org-mode equivalents plus captions. Pandoc leaves these alone when converting from html to org-mode.

The awk script,

# replace_img.awk
#
# Sample input:
#   <img src="/pics/01.png" alt="my very first pic"/>
# Sample output:
#   #+CAPTION: my very first pic
#   [[/pics/01.png]]

BEGIN {
    # Split the input at "
    FS = "\""
}
# Replace all img tags with an org-mode equivalent.
/^<img src/{
    print "#+CAPTION: " $4
    print "[["$2"]]"
}
# Leave the rest of the file intact.
!/^<img src/

and the bash script,

# replace_img.sh

php_files=`find -name "*.php"`
for file in $php_files; do
    awk -f replace_img.awk $file > tmp && mv tmp $file
done

Place these files at the root of the project, chomod +x replace_img.sh and then run the script: ./replace_img.sh. Change the extension of the files, if needed. I've had over 300 php files.

Upvotes: 0

mb21
mb21

Reputation: 39413

Currently the Org Writer unfortunately throws away the image alt and title strings. Feel free to submit an issue or patch if there's a way to do alt text in Org.

You can also always write a filter to modify the doc AST and add the alt text to an additional paragraph.

Upvotes: 1

Related Questions