Reputation: 63
I'm trying to take a large file of Korean vocabulary and set it up to import smoothly into a flashcard program. The format of the file is [Korean word/phrase] [English translation] [Korean sample sentence]. Example:
너무 피곤해서 Because I’m tired 너무 피곤해서 잤어요.
I can write a macro to look for the first English letter and replace the space before it with a tab. I identified the first English letter by searching for the range [a-Z]. After that I want to locate the beginning of the sample sentence by searching for the next Korean character encountered, but what is the range for Korean characters?
I found a unicode FAQ on Korean characters which seemed to suggest that each character is really just a combination of individual letters, and in some kinds of programming can be treated as the separate letters, but I probably misunderstood. The idea was that something like "식" is really the three letters "ㅅ" + "ㅣ" + "ㄱ". So I tried a search on just the one letter "ㅅ" (which appears in tons of characters in my input file) and got no hits. That sure had the potential to make things simple, but no dice.
Upvotes: 0
Views: 580
Reputation: 63
Okay, got it -- found the range here: http://en.wikipedia.org/wiki/Korean_language_and_computers#Hangul_in_Unicode
The below code in my macro finds the next Korean character in a Word document:
With Selection.Find
.Text = "[" & ChrW(44032) & "-" & ChrW(55203) & "]"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = True
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute
Upvotes: 2