Reputation: 9
I am trying to find out the length of Hindi words in Python, like 'प्रवीण' has length of 3 as per my knowledge.
w1 = 'प्रवीण'
print(len(w1))
I tried this code but it didn't work.
Upvotes: -2
Views: 572
Reputation: 122
Writing working kotlin code corresponding to the pseudo code provided by Codeman. This can help you get these 2 things:-
const val HINDI_LETTERS = "कखगघङचछजझञटठडढणतथदधनपफबभमक़ख़ग़ज़ड़ढ़फ़यरलळवहशषसऱऴअआइईउऊऋॠऌॡएऐओऔॐऍऑऎऒ"
fun getHindiWordLength(word: String): Int{
var count = 0
var n = word.length
for(i in 0..n-1){
println(word[i]) //Just to see how each character in the string looks like
if(word[i] in HINDI_LETTERS && (i == 0 || word[i-1] != '्')) // Make sure not a half-letter
count++
}
return count
}
fun splitHindiWordOnBaseLetter(word: String): MutableList<String>{
var n = word.length
var curWord = ""
val splitWords: MutableList<String> = mutableListOf()
for(i in 0..n-1){
if(word[i] in HINDI_LETTERS && (i > 0 && word[i-1] != '्')) // Make sure not a half-letter
{
splitWords.add(curWord)
curWord = ""
}
curWord += word[i]
}
splitWords.add(curWord) //last letter
return splitWords
}
I have tested this code on these inputs:-
println(getHindiWordLength("प्रवीण"))
println(splitHindiWordOnBaseLetter("प्रवीण"))
println(getHindiWordLength("आम"))
println(splitHindiWordOnBaseLetter("आम"))
println(getHindiWordLength("पेड़"))
println(splitHindiWordOnBaseLetter("पेड़"))
println(getHindiWordLength("अक्षर"))
println(splitHindiWordOnBaseLetter("अक्षर"))
println(getHindiWordLength("दिल"))
println(splitHindiWordOnBaseLetter("दिल"))
This is the output that I am getting:-
प
्
र
व
ी
ण
3
[प्र, वी, ण]
आ
म
2
[आ, म]
प
े
ड
़
2
[पे, ड़]
अ
क
्
ष
र
3
[अ, क्ष, र]
द
ि
ल
2
[दि, ल]
Upvotes: 0
Reputation: 507
As @betelgeuse has said, Hindi does not function the way you think it does. Here's some pseudocode (working) to do what you expect though:
w1 = 'प्रवीण'
def hindi_len(word):
hindi_letts = 'कखगघङचछजझञटठडढणतथदधनपफबभमक़ख़ग़ज़ड़ढ़फ़यरलळवहशषसऱऴअआइईउऊऋॠऌॡएऐओऔॐऍऑऎऒ'
# List of hindi letters that aren't halves or mantras
count = 0
for i in word:
if i in hindi_letts:
count += 1 if word[word.index(i) - 1] != '्' else 0 # Make sure it's not a half-letter
return count
print(hindi_len(w1))
This outputs 3
. It's up to you to customize it as you'd like, though.
Edit: Make sure you use python 3.x or prefix Hindi strings with u
in python 2.x, I've seen some language errors with python 2.x non-unicode encoding somewhere before...
Upvotes: 2
Reputation: 1266
In the Hindi language, each character need not be of length one as is in English. For example, वी
is not one character but rather two characters combined into one:
So in your case, the word प्रवीण
is not of length 3 but rather 6.
w1 = "प्रवीण"
for w in w1:
print(w)
And the output would be
प
्
र
व
ी
ण
Upvotes: 0