Reputation: 85
I am using Jsoup to parse xml file stored in filesystem,But when I parse link element changes its scope...
XML file:-
<movies>
<movie>
<id>0</id>
<name>Aag - 1948</name>
<link>http://www.songspk.pk/indian/aag_1948.html</link>
</movie>
<movie>
<id>1</id>
<name></name>
<link>#</link>
</movie>
<movie>
<id>2</id>
<name>Aa Ab Laut Chalain</name>
<link>http://www.songspk.pk/aa_ab_laut_chalein.html</link>
</movie>
<movie>
<id>3</id>
<name>Aag - RGV Ki Aag</name>
<link>http://www.songspk.pk/aag.html</link>
</movie>
</movies>
Java implementation:-
public class DownloadSongsList {
private static Document document;
public static void main(String...string) throws IOException{
document = Jsoup.parse(new File("c:/movies.xml"), "UTF-8");
Elements movies = document.getElementsByTag("movies");
System.out.println(movies.html());
}
}
Output:-
<movie>
<id>
0
</id>
<name>
Aag - 1948
</name>
<link /> http://www.songspk.pk/indian/aag_1948.html
</movie>
<movie>
<id>
1
</id>
<name></name>
<link />#
</movie>
<movie>
<id>
2
</id>
<name>
Aa Ab Laut Chalain
</name>
<link />http://www.songspk.pk/aa_ab_laut_chalein.html
</movie>
<movie>
<id>
3
</id>
<name>
Aag - RGV Ki Aag
</name>
<link />http://www.songspk.pk/aag.html
</movie>
I want to parse links but can't get due to this problem. And I would like to stick to Jsoup because I use this same library to create the following xml files...
Upvotes: 0
Views: 457
Reputation: 5538
Have you tried using the Parser.xmlParser()
?
Example:
Document doc = Jsoup.parse(new File("c:/movies.xml"), "", Parser.xmlParser());
Elements movies = doc.getElementsByTag("movies");
System.out.println(movies.html());
Should output:
<movie>
<id>
0
</id>
<name>
Aag - 1948
</name>
<link>
http://www.songspk.pk/indian/aag_1948.html
</link>
</movie>
<movie>
<id>
1
</id>
<name></name>
<link>
#
</link>
</movie>
<movie>
<id>
2
</id>
<name>
Aa Ab Laut Chalain
</name>
<link>
http://www.songspk.pk/aa_ab_laut_chalein.html
</link>
</movie>
<movie>
<id>
3
</id>
<name>
Aag - RGV Ki Aag
</name>
<link>
http://www.songspk.pk/aag.html
</link>
</movie>
So then you can extract the <link>
tags normally:
Elements links = doc.getElementsByTag("link");
Upvotes: 1