Hardip Patel
Hardip Patel

Reputation: 85

parsing links from given file using jsoup

I am using Jsoup to parse xml file stored in filesystem,But when I parse link element changes its scope...

XML file:-

<movies>
    <movie>
        <id>0</id>
        <name>Aag - 1948</name>
         <link>http://www.songspk.pk/indian/aag_1948.html</link>
    </movie>
    <movie>
        <id>1</id>
        <name></name>
         <link>#</link>
    </movie>
    <movie>
        <id>2</id>
        <name>Aa Ab Laut Chalain</name>
         <link>http://www.songspk.pk/aa_ab_laut_chalein.html</link>
    </movie>
    <movie>
        <id>3</id>
        <name>Aag - RGV Ki Aag</name>
         <link>http://www.songspk.pk/aag.html</link>
    </movie>
</movies>

Java implementation:-

public class DownloadSongsList {

private static Document document;

public static void main(String...string) throws IOException{
    document = Jsoup.parse(new File("c:/movies.xml"), "UTF-8");

    Elements movies = document.getElementsByTag("movies");

    System.out.println(movies.html());


}

}

Output:-

<movie> 
 <id>
  0
 </id> 
 <name>
  Aag - 1948
 </name> 
 <link /> http://www.songspk.pk/indian/aag_1948.html  
</movie> 
<movie> 
 <id>
  1
 </id> 
 <name></name> 
 <link /># 
</movie> 
<movie> 
 <id>
  2
 </id> 
 <name>
  Aa Ab Laut Chalain
 </name> 
 <link />http://www.songspk.pk/aa_ab_laut_chalein.html 
</movie> 
<movie> 
 <id>
  3
 </id> 
 <name>
  Aag - RGV Ki Aag
 </name> 
 <link />http://www.songspk.pk/aag.html 
</movie>

I want to parse links but can't get due to this problem. And I would like to stick to Jsoup because I use this same library to create the following xml files...

Upvotes: 0

Views: 457

Answers (1)

ashatte
ashatte

Reputation: 5538

Have you tried using the Parser.xmlParser()?

Example:

Document doc = Jsoup.parse(new File("c:/movies.xml"), "", Parser.xmlParser());
Elements movies = doc.getElementsByTag("movies");
System.out.println(movies.html());

Should output:

<movie> 
 <id>
  0
 </id>
 <name>
  Aag - 1948
 </name>
 <link>
  http://www.songspk.pk/indian/aag_1948.html
 </link> 
</movie>
<movie> 
 <id>
  1
 </id> 
 <name></name> 
 <link>
  #
 </link> 
</movie> 
<movie> 
 <id>
  2
 </id> 
 <name>
  Aa Ab Laut Chalain
 </name> 
 <link>
  http://www.songspk.pk/aa_ab_laut_chalein.html
 </link> 
</movie> 
<movie> 
 <id>
  3
 </id> 
 <name>
  Aag - RGV Ki Aag
 </name> 
 <link>
  http://www.songspk.pk/aag.html
 </link> 
</movie>

So then you can extract the <link> tags normally:

Elements links = doc.getElementsByTag("link");

Upvotes: 1

Related Questions