PRAVEEN KUMAR
PRAVEEN KUMAR

Reputation: 9

Hindi words length

I am trying to find out the length of Hindi words in Python, like 'प्रवीण' has length of 3 as per my knowledge.

w1 = 'प्रवीण'
print(len(w1))

I tried this code but it didn't work.

Upvotes: -2

Views: 572

Answers (3)

Rohit Singla
Rohit Singla

Reputation: 122

Writing working kotlin code corresponding to the pseudo code provided by Codeman. This can help you get these 2 things:-

  1. Length of the string in terms of base characters
  2. Split the string into parts on the basis of base characters
const val HINDI_LETTERS = "कखगघङचछजझञटठडढणतथदधनपफबभमक़ख़ग़ज़ड़ढ़फ़यरलळवहशषसऱऴअआइईउऊऋॠऌॡएऐओऔॐऍऑऎऒ"

fun getHindiWordLength(word: String): Int{
    var count = 0
    var n = word.length
    for(i in 0..n-1){
        println(word[i])    //Just to see how each character in the string looks like
        if(word[i] in HINDI_LETTERS && (i == 0 || word[i-1] != '्'))        // Make sure not a half-letter
            count++
    }
    return count
}

fun splitHindiWordOnBaseLetter(word: String): MutableList<String>{
    var n = word.length
    var curWord = ""
    val splitWords: MutableList<String> = mutableListOf()
    for(i in 0..n-1){
        if(word[i] in HINDI_LETTERS && (i > 0 && word[i-1] != '्'))     // Make sure not a half-letter
        {
            splitWords.add(curWord)
            curWord = ""
        }
        curWord += word[i]
    }
    splitWords.add(curWord)         //last letter
    return splitWords
}

I have tested this code on these inputs:-

    println(getHindiWordLength("प्रवीण"))
    println(splitHindiWordOnBaseLetter("प्रवीण"))
    
    println(getHindiWordLength("आम"))
    println(splitHindiWordOnBaseLetter("आम"))
    
    println(getHindiWordLength("पेड़"))
    println(splitHindiWordOnBaseLetter("पेड़"))
    
    println(getHindiWordLength("अक्षर"))
    println(splitHindiWordOnBaseLetter("अक्षर"))
    
    println(getHindiWordLength("दिल"))
    println(splitHindiWordOnBaseLetter("दिल"))

This is the output that I am getting:-

प
्
र
व
ी
ण
3
[प्र, वी, ण]
आ
म
2
[आ, म]
प
े
ड
़
2
[पे, ड़]
अ
क
्
ष
र
3
[अ, क्ष, र]
द
ि
ल
2
[दि, ल]

Upvotes: 0

Codeman
Codeman

Reputation: 507

As @betelgeuse has said, Hindi does not function the way you think it does. Here's some pseudocode (working) to do what you expect though:

w1 = 'प्रवीण'

def hindi_len(word):
    hindi_letts = 'कखगघङचछजझञटठडढणतथदधनपफबभमक़ख़ग़ज़ड़ढ़फ़यरलळवहशषसऱऴअआइईउऊऋॠऌॡएऐओऔॐऍऑऎऒ'
    # List of hindi letters that aren't halves or mantras
    count = 0
    for i in word:
        if i in hindi_letts:
            count += 1 if word[word.index(i) - 1] != '्' else 0 # Make sure it's not a half-letter
    return count

print(hindi_len(w1))

This outputs 3. It's up to you to customize it as you'd like, though.

Edit: Make sure you use python 3.x or prefix Hindi strings with u in python 2.x, I've seen some language errors with python 2.x non-unicode encoding somewhere before...

Upvotes: 2

betelgeuse
betelgeuse

Reputation: 1266

In the Hindi language, each character need not be of length one as is in English. For example, वी is not one character but rather two characters combined into one:

So in your case, the word प्रवीण is not of length 3 but rather 6.

w1 = "प्रवीण"
for w in w1:
    print(w)

And the output would be

प
्
र
व
ी
ण

Upvotes: 0

Related Questions