Reputation: 57
I am a regex beginner and I have been practicing by going through a problem on this website. I am given the following text:
Fedora Core ftp
Fedora Extras http ftp rsync
ftp://ftp7.br.FreeBSD.org/pub/FreeBSD/ (ftp)
ftp://ftp3.de.FreeBSD.org/pub/FreeBSD/ (ftp)
ftp://ftp.is.FreeBSD.org/pub/FreeBSD/ (ftp / rsync)
ftp://ftp4.jp.FreeBSD.org/pub/FreeBSD/ (ftp)
ftp://ftp.no.FreeBSD.org/pub/FreeBSD/ (ftp / rsync)
*
ftp://ftp3.no.FreeBSD.org/pub/FreeBSD/ (ftp)
ftp://ftp.pt.FreeBSD.org/pub/FreeBSD/ (ftp)
ftp://ftp1.ro.FreeBSD.org/pub/FreeBSD/ (ftp / ftpv6)
ftp://ftp3.es.FreeBSD.org/pub/FreeBSD/ (ftp)
ftp://ftp2.tw.FreeBSD.org/pub/FreeBSD/ (ftp / ftpv6 / http / httpv6 / rsync / rsyncv6)
ftp://ftp6.uk.FreeBSD.org/pub/FreeBSD/ (ftp)
ftp://ftp6.us.FreeBSD.org/pub/FreeBSD/ (ftp / http)
sunsite.informatik.rwth-aachen.de [ftp] [http] Rheinisch-Westfälische Technische Hochschule Aachen
lame.lut.fi [http] Computer Club Ruut (Finland)
1 Gbits/sec IPv4 and IPv6
FR Fedora Mirror ftp.proxad.net
US distro.ibiblio.org jungle.metalab.unc.edu
Fedora Core ftp
ftp://ftp.FreeBSD.org/pub/FreeBSD/ (ftp)
ftp://ftp11.FreeBSD.org/pub/FreeBSD/ (ftp)
ftp://ftp14.FreeBSD.org/pub/FreeBSD/ (ftp)
ftp://ftp.ar.FreeBSD.org/pub/FreeBSD/ (ftp)
ftp://ftp3.au.FreeBSD.org/pub/FreeBSD/ (ftp)
In case of problems, please contact the hostmaster <[email protected]> for this domain.
ftp://ftp4.br.FreeBSD.org/pub/FreeBSD/ (ftp)
ftp://ftp.hr.FreeBSD.org/pub/FreeBSD/ (ftp)
ftp://ftp.cz.FreeBSD.org/pub/FreeBSD/ (ftp / http / rsync)
ftp://ftp.il.FreeBSD.org/pub/FreeBSD/ (ftp / ftpv6)
ftp://ftp7.jp.FreeBSD.org/pub/FreeBSD/ (ftp)
*
ftp://ftp7.ua.FreeBSD.org/pub/FreeBSD/ (ftp)
ftp://ftp11.ua.FreeBSD.org/pub/FreeBSD/ (ftp)
I need to extract all ftp addresses, so lines starting with ftp and ending with FreeBSD. I have been able to extract some, with this regex:
ftp://ftp\d\d?.\w\w.FreeBSD.org/pub/FreeBSD/
But many do not extract, e.g. ftp://ftp14.FreeBSD.org/pub/FreeBSD/ . There is no answers, please let me know what my expression is missing so I can improve. Thank you.
Upvotes: 2
Views: 98
Reputation: 89547
It seems you are trying to extract all urls with the domain: "FreeBSD.org" following with the path: "/pub/FreeBSD/".
I suggest:
\bftp://[A-Za-z0-9.]*\bFreeBSD\.org/pub/FreeBSD/
Note that the dot needs to be escaped outside a character class but not inside.
Upvotes: 3
Reputation: 27723
This expression might simply extract those desired FTPs:
ftp://\S*/FreeBSD/
If you wish to explore/simplify/modify the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
Upvotes: 2
Reputation: 31
Look at this:
ftp://ftp(\d{0,2}.\w{0,2})?.FreeBSD.org/pub/FreeBSD/
Think what is constant and what changes in your ftp addresses. Beginning is always same. Then you can have 0-2 digits after ftp, followed by a dot, optionally followed by a two-letter (country code?)(so make it optional).
And then you have one at least where you have no country code and no numbers after ftp. So just make it optional (using ?). The rest is always constant, i.e. .FreeBSD.org/pub/FreeBSD/
. Hope this helps.
Upvotes: 3