Reputation: 9006
I've already tried my best but regular expressions aren't really my thing. :(
I need to extract certain URLs that end in a certain file extension. For example, I want to be able to parse a large paragraph and extract all URLs that end with *.txt
. So for example,
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla hendrerit aliquet erat at ultrices. Donec eu nunc nec nibh http://www.somesite.com/somefolder/blahblah/etc/something.txt iaculis dictum. Quisque nisi neque, vulputate quis pellentesque blandit, faucibus eget nisl.
I need to be able to take http://www.somesite.com/somefolder/blahblah/etc/something.txt out of the above paragraph but the number of URLs to extract will vary. It will be dynamic based on what the user inputs. It can have 3 links that end with *.txt
and 3 links that don't end with *.txt
. I only need to extract those that does end in *.txt
. Can anyone possibly give me the code I need for this?
Upvotes: 0
Views: 87
Reputation: 91385
How about:
$str = 'Lorem ipsum dolor sit amet. Donec eu nunc nec nibh http://www.somesite.com/somefolder/blahblah/etc/something.txt. Lorem ipsum dolor sit amet. Donec eu nunc nec nibh http://www.somesite.com/somefolder/blahblah/etc/something.doc.';
preg_match_all('#\b(http://\S+\.txt)\b#', $str, $m);
explanation:
# : regex delimiter
\b : word boundary
( : begin capture group
http:// : litteral http://
\S+ : one or more non space
\. : a dot
txt : litteral txt
) : end capture group
\b : word boundary
# : regex delimiter
Upvotes: 0
Reputation: 13500
Assuming these are all proper URLs, then they won't have any spaces in them. We can take advantage of that fact to make the regular expression really simple:
preg_match_all("/([^ ]+\.(txt|doc))/i", $text, $matches);
// ([^ ]+ Match anything, except for a space.
// \. A normal period.
// (txt|doc) The word "txt" or "doc".
// )/i Case insensitive (so TXT and TxT also work)
If you don't need to match multiple file extensions, then you can change "(txt|doc)" to "txt".
$matches
will contain a number of arrays, you'll want key number 0 or 1. To make the array easier to read, you can use:
preg_match_all("/(?P<matched_urls>[^ ]+\.(txt|doc))/i", $text, $matches);
This will make $matches
look something like this:
array([0] => array(), [1] => array(), [2] => array(), ["matched_urls"] => array());
Should be obvious which key you need.
Upvotes: 0
Reputation: 6136
You can find what you need with /(?<=\s)http:\/\/\S+\.txt(?=\s)/
Which means:
Upvotes: 1