Quincy Kwende
Quincy Kwende

Reputation: 76

Regular Expressions to get text between tags

I am writing an application to get the title of an html page, some text under the body tag and an image. It is something like the share stuff of facebook. I can get a regular expression that does that. Thanks for your assitance.

Upvotes: 3

Views: 7403

Answers (3)

Klemen Tusar
Klemen Tusar

Reputation: 9689

I just coined this expression which gets the text inside tags (the node value), without the actual tags themselves.

(?<=\"\>)(.*?)(?=\<\/)

You can see it in action with PHP here: http://codepad.viper-7.com/AUTcv3

Upvotes: 1

Jens
Jens

Reputation: 25563

You should probably use a HTML Parser instead of Regular Expression. See Simple HTML DOM, for example.

A regular expression for your task will be very hard to maintain and will break easily on any changes of the pages in question, not to mention that you cannot account for HTML comments.

Upvotes: 2

Scharron
Scharron

Reputation: 17757

A regexp like <title>(.*?)</title> will get you the content of title. The .*? part is for matching any characters, in a non greedy way (in case there is another title end tag in the page).

Upvotes: 6

Related Questions