JohnM
JohnM

Reputation: 115

regular expression differences in Java

I have the following HTML that I wish to find the currently playing artist and song title my regular expression works in http://gskinner.com/RegExr/ and it compiles in Java correctly yet it doesn't match anything

HTML snippet

<div class="audio_playing_title">Currently Playing.
  <div class="audio_home_box">
     <div class="audio_playing_stats">
        <div class="audio_playing">
           <div class="audio_dj_title">PRESENTER:
                AutoDJ - The Slogan
           </div>
          <div class="audio_track_title">SONG TITLE:
               The Artist Name - Song Name
          </div>
        </div>
     </div>
</div>

The Java code

String data = getWebsiteData(url);
data = data.replace("\\t", "");

Pattern pat = Pattern.compile("<div class=\"audio_track_title\">SONG TITLE:\r(.+)\r</div>");

Matcher matcher = pat.matcher(data);

if (matcher.matches())
{
    data = matcher.group(1);
}
else
{
    System.out.println("No match");
}
return data;

Upvotes: 1

Views: 108

Answers (1)

Keppil
Keppil

Reputation: 46219

Your problem is that Matcher#matches() only returns true if the whole sequence matches your regex.

You need Matcher#find(), which will look for matching subsequences.

I also think you would be better off using the Pattern#DOTALL flag to let your . match line breaks too instead of trying to match them yourself, since the line break standard differs between systems:

Pattern pat = Pattern.compile("<div class=\"audio_track_title\">SONG TITLE:\r(.+)\r</div>", Pattern.DOTALL);

Upvotes: 5

Related Questions