Reputation: 225
I'm writing a tool that queries http://checkip.dyndns.org/ to get the user's IP address. I need to parse the result which will come back in the form
<html><head><title>Current IP Check</title></head><body>Current IP Address: 128.237.138.116</body></html>
I could do something awkward with some repeated calls to int_of_string, but I imagine there has to be a nice, concise way to do this with regular expressions or something like that? eg something of the form
let ip_re = Str.regexp ".*Address: %d.%d.%d.%d".
Or perhaps this is best done with scanf? Can someone with more knowledge of idiomatic OCaml point me the right way?
Upvotes: 2
Views: 348
Reputation: 1
You can try:
curl ip.sb
curl ipv4.ip.sb
curl ipv6.ip.sb
for current IP address, IPv4 address and IPv6 address.
Upvotes: 0
Reputation:
No need for regular expressions.
Here's a self contained example, it should run with utop and depends on ezxmlm, which you can install with opam install ezxmlm
#require "ezxmlm, str"
let example = "<html><head><title>Current IP Check</title></head>\
<body>Current IP Address: 128.237.138.116</body></html>"
let () =
let open Ezxmlm in
let (_, xml) = from_string example in
let ip_addr = member "html" xml |>
member "body" |>
data_to_string in
(* Brittle solution *)
let sub_str_i = (String.rindex content ':') + 2 in
print_endline (Str.string_after content sub_str_i)
Upvotes: 1
Reputation: 66808
You don’t say what you really want to do. Since the answer is coming from a moderately reliable source, let’s say you just want to extract the IP address. In other words, you want to be somewhat tolerant of small changes in the format while extracting an IP address that you’re almost certain is really there.
For the value you give, I’d be inclined to do something like this:
let extract_ip s =
let nums = Str.split (Str.regexp "[^0-9]+") s in
String.concat "." nums
If you want to be a little more careful you could verify that there are 4 numbers in the list. To be even more careful you could verify that each number is between 0 and 255 (inclusive).
This will fail if Dyndns introduces any digits in the page that aren’t part of the IP address. (Things like <h1>
, more complicated label text, etc.) You can respond by just making this code a little more clever (e.g., take the last 4 numbers you see on the page). Or you could give in and actually start parsing the HTML. My suggestion: don’t try to use regular expressions for this, use a real HTML parser.
Upvotes: 2