FRR
FRR

Reputation: 99

How to print UTF-8 encoded charecters in JSoup

I am using JSoup, version: 1.8.1 and want to parse content from http://corpus.quran.com/wordbyword.jsp . It is encoded with UTF-8 and contains Arabic charecters. I've written the following code which prints some contents in the console.

Where there are some special chrecters in the html file, I have got '?' chrecters in those places. I don't know how to solve this problem.

public static void main(String[] args){
        String url = "http://corpus.quran.com/wordbyword.jsp";
        System.out.printf("Fetching %s...\n", url);

        Document doc=null;
        try {
            doc = Jsoup.parse(new URL(url).openStream(), "UTF-8", url);
            //doc = Jsoup.connect(url).get();

        } catch (IOException e) {
            e.printStackTrace();
            System.exit(0);
        }
        System.out.println("Fetching completed. Collecting Data...");
        Elements columns=doc.select("td");
        for (Element link : columns) {
                System.out.printf(" * %s %s\n", link.tagName(),link.text());
        }
        System.out.println("------------------------------------------");
    }

And the output in the console:

Fetching http://corpus.quran.com/wordbyword.jsp...
Fetching completed. Collecting Data...
 * td  
 * td Qur'an | Word by Word | Audio | Prayer Times | Android | New : beta.quran.com
 * td 
 * td __
 * td Sign In
 * td Search
 * td  
 * td 
 * td 
 * td __
 * td Verse (1:1) - Word by Word
 * td Word by Word Quran Dictionary English Translation Syntactic Treebank Ontology of Concepts Documentation Quranic Grammar Message Board Resources Feedback Java API
 * td Word by Word
 * td Quran Dictionary
 * td English Translation
 * td Syntactic Treebank
 * td Ontology of Concepts
 * td Documentation
 * td Quranic Grammar
 * td Message Board
 * td Resources
 * td Feedback
 * td Java API
 * td __
 * td Welcome to the Quranic Arabic Corpus, an annotated linguistic resource which shows the Arabic grammar, syntax and morphology for each word in the Holy Quran. Click on an Arabic word below to see details of the word's grammar, or to suggest a correction. Chapter (1) s?rat l-f?ti?ah (The Opening)Chapter (2) s?rat l-baqarah (The Cow)Chapter (3) s?rat ?l ?im'r?n (The Family of Imr?n)Chapter (4) s?rat l-nis?a (The Women)Chapter (5) s?rat l-m?idah (The Table spread with Food)Chapter (6) s?rat l-an??m (The Cattle)Chapter (7) s?rat l-a?r?f (The Heights)Chapter (8) s?rat l-anf?l (The Spoils of War)Chapter (9) s?rat l-tawbah (The Repentance)Chapter (10) s?rat y?nus (Jonah)Chapter (11) s?rat h?d (Hud)Chapter (12) s?rat y?suf (Joseph)Chapter (13) s?rat l-ra?d (The Thunder)Chapter (14) s?rat ib'r?h?m (Abraham)Chapter (15) s?rat l-?ij'r (The Rocky Tract)Chapter (16) s?rat l-na?l (The Bees)Chapter (17) s?rat l-isr? (The Night Journey)Chapter (18) s?rat l-kahf (The Cave)Chapter (19) s?rat maryam (Mary)Chapter (20) s?rat ?? h?Chapter (21) s?rat l-anbiy?a (The Prophets)Chapter (22) s?rat l-?aj (The Pilgrimage)Chapter (23) s?rat l-mu'min?n (The Believers)Chapter (24) s?rat l-n?r (The Light)Chapter (25) s?rat l-fur'q?n (The Criterion)Chapter (26) s?rat l-shu?ar? (The Poets)Chapter (27) s?rat l-naml (The Ants)Chapter (28) s?rat l-qa?a? (The Stories)Chapter (29) s?rat l-?ankab?t (The Spider)Chapter (30) s?rat l-r?m (The Romans)Chapter (31) s?rat luq'm?nChapter (32) s?rat l-sajdah (The Prostration)Chapter (33) s?rat l-a?z?b (The Combined Forces)Chapter (34) s?rat saba (Sheba)Chapter (35) s?rat f??ir (The Originator)Chapter (36) s?rat y? s?nChapter (37) s?rat l-??f?t (Those Ranges in Ranks)Chapter (38) s?rat ??dChapter (39) s?rat l-zumar (The Groups)Chapter (40) s?rat gh?fir (The Forgiver God)Chapter (41) s?rat fu??ilat (Explained in Detail)Chapter (42) s?rat l-sh?r? (Consultation)Chapter (43) s?rat l-zukh'ruf (The Gold Adornment)Chapter (44) s?rat l-dukh?n (The Smoke)Chapter (45) s?rat l-j?thiyah (Crouching)Chapter (46) s?rat l-a?q?f (The Curved Sand-hills)Chapter (47) s?rat mu?ammadChapter (48) s?rat l-fat? (The Victory)Chapter (49) s?rat l-?ujur?t (The Dwellings)Chapter (50) s?rat q?fChapter (51) s?rat l-dh?riy?t (The Wind that Scatter)Chapter (52) s?rat l-??r (The Mount)Chapter (53) s?rat l-najm (The Star)Chapter (54) s?rat l-qamar (The Moon)Chapter (55) s?rat l-ra?m?n (The Most Gracious)Chapter (56) s?rat l-w?qi?ah (The Event)Chapter (57) s?rat l-?ad?d (The Iron)Chapter (58) s?rat l-muj?dilah (She That Disputeth)Chapter (59) s?rat l-?ashr (The Gathering)Chapter (60) s?rat l-mum'ta?anah (The Woman to be examined)Chapter (61) s?rat l-?af (The Row)Chapter (62) s?rat l-jumu?ah (Friday)Chapter (63) s?rat l-mun?fiq?n (The Hypocrites)Chapter (64) s?rat l-tagh?bun (Mutual Loss & Gain)Chapter (65) s?rat l-?al?q (The Divorce)Chapter (66) s?rat l-ta?r?m (The Prohibition)Chapter (67) s?rat l-mulk (Dominion)Chapter (68) s?rat l-qalam (The Pen)Chapter (69) s?rat l-??qah (The Inevitable)Chapter (70) s?rat l-ma??rij (The Ways of Ascent)Chapter (71) s?rat n??Chapter (72) s?rat l-jin (The Jinn)Chapter (73) s?rat l-muzamil (The One wrapped in Garments)Chapter (74) s?rat l-mudathir (The One Enveloped)Chapter (75) s?rat l-qiy?mah (The Resurrection)Chapter (76) s?rat l-ins?n (Man)Chapter (77) s?rat l-mur'sal?t (Those sent forth)Chapter (78) s?rat l-naba (The Great News)Chapter (79) s?rat l-n?zi??t (Those who Pull Out)Chapter (80) s?rat ?abasa (He frowned)Chapter (81) s?rat l-takw?r (The Overthrowing)Chapter (82) s?rat l-infi??r (The Cleaving)Chapter (83) s?rat l-mu?afif?n (Those Who Deal in Fraud)Chapter (84) s?rat l-inshiq?q (The Splitting Asunder)Chapter (85) s?rat l-bur?j (The Big Stars)Chapter (86) s?rat l-??riq (The Night-Comer)Chapter (87) s?rat l-a?l? (The Most High)Chapter (88) s?rat l-gh?shiyah (The Overwhelming)Chapter (89) s?rat l-fajr (The Dawn)Chapter (90) s?rat l-balad (The City)Chapter (91) s?rat l-shams (The Sun)Chapter (92) s?rat l-layl (The Night)Chapter (93) s?rat l-?u?? (The Forenoon)Chapter (94) s?rat l-shar? (The Opening Forth)Chapter (95) s?rat l-t?n (The Fig)Chapter (96) s?rat l-?alaq (The Clot)Chapter (97) s?rat l-qadr (The Night of Decree)Chapter (98) s?rat l-bayinah (The Clear Evidence)Chapter (99) s?rat l-zalzalah (The Earthquake)Chapter (100) s?rat l-??diy?t (Those That Run)Chapter (101) s?rat l-q?ri?ah (The Striking Hour)Chapter (102) s?rat l-tak?thur (The piling Up)Chapter (103) s?rat l-?a?r (Time)Chapter (104) s?rat l-humazah (The Slanderer)Chapter (105) s?rat l-f?l (The Elephant)Chapter (106) s?rat qurayshChapter (107) s?rat l-m???n (Small Kindnesses)Chapter (108) s?rat l-kawthar (A River in Paradise)Chapter (109) s?rat l-k?fir?n (The Disbelievers)Chapter (110) s?rat l-na?r (The Help)Chapter (111) s?rat l-masad (The Palm Fibre)Chapter (112) s?rat l-ikhl?? (Sincerity)Chapter (113) s?rat l-falaq (The Daybreak)Chapter (114) s?rat l-n?s (Mankind) Verse (1:1)Verse (1:2)Verse (1:3)Verse (1:4)Verse (1:5)Verse (1:6)Verse (1:7) Go Chapter (1) s?rat l-f?ti?ah (The Opening) Translation Arabic word Syntax and morphology (1:1:1) bis'mi In (the) name P – prefixed preposition bi N – genitive masculine noun ??? ?????? (1:1:2) l-lahi (of) Allah, PN – genitive proper noun ? Allah ??? ??????? ????? (1:1:3) l-ra?m?ni the Most Gracious, ADJ – genitive masculine singular adjective ??? ?????? (1:1:4) l-ra??mi the Most Merciful. ADJ – genitive masculine singular adjective ??? ?????? (1:2:1) al-?amdu All praises and thanks N – nominative masculine noun ??? ????? (1:2:2) lillahi (be) to Allah, P – prefixed preposition l?m PN – genitive proper noun ? Allah ??? ?????? (1:2:3) rabbi the Lord N – genitive masculine noun ??? ????? (1:2:4) l-??lam?na of the universe N – genitive masculine plural noun ??? ????? (1:3:1) al-ra?m?ni The Most Gracious, ADJ – genitive masculine singular adjective ??? ?????? (1:3:2) l-ra??mi the Most Merciful. ADJ – genitive masculine singular adjective ??? ?????? (1:4:1) m?liki (The) Master N – genitive masculine active participle ??? ????? (1:4:2) yawmi (of the) Day N – genitive masculine noun ? Day of Resurrection ??? ????? (1:4:3) l-d?ni (of the) Judgment. N – genitive masculine noun ??? ????? (1:5:1) iyy?ka You Alone PRON – 2nd person masculine singular personal pronoun ? Allah ???? ????? (1:5:2) na?budu we worship, V – 1st person plural imperfect verb ??? ????? (1:5:3) wa-iyy?ka and You Alone CONJ – prefixed conjunction wa (and) PRON – 2nd person masculine singular personal pronoun ? Allah ????? ????? ???? ????? (1:5:4) nasta??nu we ask for help. V – 1st person plural (form X) imperfect verb ??? ????? (1:6:1) ih'din? Guide us V – 2nd person masculine singular imperative verb PRON – 1st person plural object pronoun PRON – implicit subject pronoun ? Allah ??? ??? ?«??» ???? ???? ?? ??? ??? ????? ?? (1:6:2) l-?ir??a (to) the path, N – accusative masculine noun ??? ????? (1:6:3) l-mus'taq?ma the straight. ADJ – accusative masculine (form X) active participle ??? ?????? Quran Recitation by Saad Al-Ghamadi Verse 1-6 | 7
 * td Chapter (1) s?rat l-f?ti?ah (The Opening)Chapter (2) s?rat l-baqarah (The Cow)Chapter (3) s?rat ?l ?im'r?n (The Family of Imr?n)Chapter (4) s?rat l-nis?a (The Women)Chapter (5) s?rat l-m?idah (The Table spread with Food)Chapter (6) s?rat l-an??m (The Cattle)Chapter (7) s?rat l-a?r?f (The Heights)Chapter (8) s?rat l-anf?l (The Spoils of War)Chapter (9) s?rat l-tawbah (The Repentance)Chapter (10) s?rat y?nus (Jonah)Chapter (11) s?rat h?d (Hud)Chapter (12) s?rat y?suf (Joseph)Chapter (13) s?rat l-ra?d (The Thunder)Chapter (14) s?rat ib'r?h?m (Abraham)Chapter (15) s?rat l-?ij'r (The Rocky Tract)Chapter (16) s?rat l-na?l (The Bees)Chapter (17) s?rat l-isr? (The Night Journey)Chapter (18) s?rat l-kahf (The Cave)Chapter (19) s?rat maryam (Mary)Chapter (20) s?rat ?? h?Chapter (21) s?rat l-anbiy?a (The Prophets)Chapter (22) s?rat l-?aj (The Pilgrimage)Chapter (23) s?rat l-mu'min?n (The Believers)Chapter (24) s?rat l-n?r (The Light)Chapter (25) s?rat l-fur'q?n (The Criterion)Chapter (26) s?rat l-shu?ar? (The Poets)Chapter (27) s?rat l-naml (The Ants)Chapter (28) s?rat l-qa?a? (The Stories)Chapter (29) s?rat l-?ankab?t (The Spider)Chapter (30) s?rat l-r?m (The Romans)Chapter (31) s?rat luq'm?nChapter (32) s?rat l-sajdah (The Prostration)Chapter (33) s?rat l-a?z?b (The Combined Forces)Chapter (34) s?rat saba (Sheba)Chapter (35) s?rat f??ir (The Originator)Chapter (36) s?rat y? s?nChapter (37) s?rat l-??f?t (Those Ranges in Ranks)Chapter (38) s?rat ??dChapter (39) s?rat l-zumar (The Groups)Chapter (40) s?rat gh?fir (The Forgiver God)Chapter (41) s?rat fu??ilat (Explained in Detail)Chapter (42) s?rat l-sh?r? (Consultation)Chapter (43) s?rat l-zukh'ruf (The Gold Adornment)Chapter (44) s?rat l-dukh?n (The Smoke)Chapter (45) s?rat l-j?thiyah (Crouching)Chapter (46) s?rat l-a?q?f (The Curved Sand-hills)Chapter (47) s?rat mu?ammadChapter (48) s?rat l-fat? (The Victory)Chapter (49) s?rat l-?ujur?t (The Dwellings)Chapter (50) s?rat q?fChapter (51) s?rat l-dh?riy?t (The Wind that Scatter)Chapter (52) s?rat l-??r (The Mount)Chapter (53) s?rat l-najm (The Star)Chapter (54) s?rat l-qamar (The Moon)Chapter (55) s?rat l-ra?m?n (The Most Gracious)Chapter (56) s?rat l-w?qi?ah (The Event)Chapter (57) s?rat l-?ad?d (The Iron)Chapter (58) s?rat l-muj?dilah (She That Disputeth)Chapter (59) s?rat l-?ashr (The Gathering)Chapter (60) s?rat l-mum'ta?anah (The Woman to be examined)Chapter (61) s?rat l-?af (The Row)Chapter (62) s?rat l-jumu?ah (Friday)Chapter (63) s?rat l-mun?fiq?n (The Hypocrites)Chapter (64) s?rat l-tagh?bun (Mutual Loss & Gain)Chapter (65) s?rat l-?al?q (The Divorce)Chapter (66) s?rat l-ta?r?m (The Prohibition)Chapter (67) s?rat l-mulk (Dominion)Chapter (68) s?rat l-qalam (The Pen)Chapter (69) s?rat l-??qah (The Inevitable)Chapter (70) s?rat l-ma??rij (The Ways of Ascent)Chapter (71) s?rat n??Chapter (72) s?rat l-jin (The Jinn)Chapter (73) s?rat l-muzamil (The One wrapped in Garments)Chapter (74) s?rat l-mudathir (The One Enveloped)Chapter (75) s?rat l-qiy?mah (The Resurrection)Chapter (76) s?rat l-ins?n (Man)Chapter (77) s?rat l-mur'sal?t (Those sent forth)Chapter (78) s?rat l-naba (The Great News)Chapter (79) s?rat l-n?zi??t (Those who Pull Out)Chapter (80) s?rat ?abasa (He frowned)Chapter (81) s?rat l-takw?r (The Overthrowing)Chapter (82) s?rat l-infi??r (The Cleaving)Chapter (83) s?rat l-mu?afif?n (Those Who Deal in Fraud)Chapter (84) s?rat l-inshiq?q (The Splitting Asunder)Chapter (85) s?rat l-bur?j (The Big Stars)Chapter (86) s?rat l-??riq (The Night-Comer)Chapter (87) s?rat l-a?l? (The Most High)Chapter (88) s?rat l-gh?shiyah (The Overwhelming)Chapter (89) s?rat l-fajr (The Dawn)Chapter (90) s?rat l-balad (The City)Chapter (91) s?rat l-shams (The Sun)Chapter (92) s?rat l-layl (The Night)Chapter (93) s?rat l-?u?? (The Forenoon)Chapter (94) s?rat l-shar? (The Opening Forth)Chapter (95) s?rat l-t?n (The Fig)Chapter (96) s?rat l-?alaq (The Clot)Chapter (97) s?rat l-qadr (The Night of Decree)Chapter (98) s?rat l-bayinah (The Clear Evidence)Chapter (99) s?rat l-zalzalah (The Earthquake)Chapter (100) s?rat l-??diy?t (Those That Run)Chapter (101) s?rat l-q?ri?ah (The Striking Hour)Chapter (102) s?rat l-tak?thur (The piling Up)Chapter (103) s?rat l-?a?r (Time)Chapter (104) s?rat l-humazah (The Slanderer)Chapter (105) s?rat l-f?l (The Elephant)Chapter (106) s?rat qurayshChapter (107) s?rat l-m???n (Small Kindnesses)Chapter (108) s?rat l-kawthar (A River in Paradise)Chapter (109) s?rat l-k?fir?n (The Disbelievers)Chapter (110) s?rat l-na?r (The Help)Chapter (111) s?rat l-masad (The Palm Fibre)Chapter (112) s?rat l-ikhl?? (Sincerity)Chapter (113) s?rat l-falaq (The Daybreak)Chapter (114) s?rat l-n?s (Mankind)
 * td Verse (1:1)Verse (1:2)Verse (1:3)Verse (1:4)Verse (1:5)Verse (1:6)Verse (1:7) Go
 * td Translation
 * td Arabic word
 * td Syntax and morphology
 * td (1:1:1) bis'mi In (the) name
 * td 
 * td P – prefixed preposition bi N – genitive masculine noun ??? ??????
 * td (1:1:2) l-lahi (of) Allah,
 * td 
 * td PN – genitive proper noun ? Allah ??? ??????? ?????
 * td (1:1:3) l-ra?m?ni the Most Gracious,
 * td 
 * td ADJ – genitive masculine singular adjective ??? ??????
 * td (1:1:4) l-ra??mi the Most Merciful.
 * td 
 * td ADJ – genitive masculine singular adjective ??? ??????
 * td (1:2:1) al-?amdu All praises and thanks
 * td 
 * td N – nominative masculine noun ??? ?????
 * td (1:2:2) lillahi (be) to Allah,
 * td 
 * td P – prefixed preposition l?m PN – genitive proper noun ? Allah ??? ??????
 * td (1:2:3) rabbi the Lord
 * td 
 * td N – genitive masculine noun ??? ?????
 * td (1:2:4) l-??lam?na of the universe
 * td 
 * td N – genitive masculine plural noun ??? ?????
 * td (1:3:1) al-ra?m?ni The Most Gracious,
 * td 
 * td ADJ – genitive masculine singular adjective ??? ??????
 * td (1:3:2) l-ra??mi the Most Merciful.
 * td 
 * td ADJ – genitive masculine singular adjective ??? ??????
 * td (1:4:1) m?liki (The) Master
 * td 
 * td N – genitive masculine active participle ??? ?????
 * td (1:4:2) yawmi (of the) Day
 * td 
 * td N – genitive masculine noun ? Day of Resurrection ??? ?????
 * td (1:4:3) l-d?ni (of the) Judgment.
 * td 
 * td N – genitive masculine noun ??? ?????
 * td (1:5:1) iyy?ka You Alone
 * td 
 * td PRON – 2nd person masculine singular personal pronoun ? Allah ???? ?????
 * td (1:5:2) na?budu we worship,
 * td 
 * td V – 1st person plural imperfect verb ??? ?????
 * td (1:5:3) wa-iyy?ka and You Alone
 * td 
 * td CONJ – prefixed conjunction wa (and) PRON – 2nd person masculine singular personal pronoun ? Allah ????? ????? ???? ?????
 * td (1:5:4) nasta??nu we ask for help.
 * td 
 * td V – 1st person plural (form X) imperfect verb ??? ?????
 * td (1:6:1) ih'din? Guide us
 * td 
 * td V – 2nd person masculine singular imperative verb PRON – 1st person plural object pronoun PRON – implicit subject pronoun ? Allah ??? ??? ?«??» ???? ???? ?? ??? ??? ????? ??
 * td (1:6:2) l-?ir??a (to) the path,
 * td 
 * td N – accusative masculine noun ??? ?????
 * td (1:6:3) l-mus'taq?ma the straight.
 * td 
 * td ADJ – accusative masculine (form X) active participle ??? ??????
 * td Language Research Group University of Leeds
 * td __
 * td Copyright © Kais Dukes, 2009-2011. E-mail: [email protected]. This is an open source project. The Quranic Arabic Corpus is available under the GNU public license with terms of use.
------------------------------------------

Upvotes: 1

Views: 570

Answers (1)

Juned Ahsan
Juned Ahsan

Reputation: 68715

Most of the consoles don't use UTF 8 as their default encoding and hence when we try to print a UTF8 character, those are replaced with ? . But you can always change the encoding of console, for example in eclipse, just go to :

Run Configuration -> Common -> Encoding -> Other (select UTF 8 from drop down)

Run your program and now you should see UTF 8 characters properly in eclipse console.

Upvotes: 4

Related Questions