Reputation: 1793
How would I go about reading a file in C, then iterating through each character so I can evaluate it? So for instance, I would give the input file of: 5 ≠ 10
, evaluate that as 5 is not equal to 10, and print out false. Now I can do the evaluation part, but I'm unsure how to approach reading unicode characters in C. I'm asking this question, since I've written a larger lexer, and I want to have it support unicode, however I wanted to try it out on a smaller-scale project to see how it goes.
Upvotes: 1
Views: 220
Reputation: 410
UTF-8 is an encoding format for Unicode. What you're actually interested in is parsing the text and separating out each byte. Then you need to calculate the Unicode code point to determine the character.
Ultimately you need:
- A parser that can distinguish utf-8 character boundaries.
- A translator to convert the data encoded as UTF-8 into a Unicode code point.
- And a reference list of code points and their semantic meanings.
The Not Equal To sign is Unicode code point U+2260. Which is encoded in UTF-8 as 0xE2 0x89 0xA0.
EDIT: You should be using a library for parsing UTF-8 text. What you should be focusing on is finding the code points that are relevant to your application, and interpreting their meaning within your application.
Upvotes: 2