Reputation: 159
I am not sure if this question qualifies here, but it seems odd to me letter 'f' often get messed up when copied from pdf text.
I do research as a student, and I read a lot of papers. This happens a lot when I want to copy the name of a paper to rename the pdf file.
For example, I opened the link a paper from built-in pdf display plug-in of Chrome on a Macbook Pro with OSX 10.9. Try copy the title of the paper and paste it. The 'f' in 'fluids' will be missing.
Upvotes: 11
Views: 9636
Reputation: 2215
I think the reason why @warriormole can't copy fl
is not the use of ligatures itself, but neglect or oversight on the side of PDF file creators. It was OK 10-15 or more years ago, everyone was happy just because there's some 'picture' in PDF and no one thought about content extraction and logical text rather that visual picture preservation in the long term, but now (file created in 2010) it's a shame.
PDF provides for methods to store Unicode representation of any glyph used, and file in question can be fixed relatively easy.
Upvotes: 6
Reputation: 4033
Not only the "f" will be missing, the "fl" will.
The reason for this are so-called "ligatures". In order to look nice, some combinations of letters, most notably fi, get combined into a single character. The special character is rarely treated correctly when copy-pasting. You can see this below. If you try to select the ligature, you will notice it is only one "letter". Note that your computer may render the two separate letters by using the ligature.
The following is a "fi" ligature: fi
The following is two letters: fi
Especially visible in a fixed-width font:
The following is a "fi" ligature: fi
The following is two letters: fi
Upvotes: 13