Reputation: 193
I'm trying to get whois data using this function:
function getDomain()
$domain = 'stackoverflow.com';
$whois = '';
$connection = @fsockopen('whois.internic.net', 43);
if ($connection) {
@fputs($connection, $domain ."\r\n");
while (!feof($connection)) {
$whois .= @fgets($connection, 128);
}
}
fclose($connection);
return $whois;
}
It works great for some domains but when I try "apple.com","cnn.com" or "google.com" get this:
APPLE.COM.ZON.COM
APPLE.COM.WWW.ZON.COM
APPLE.COM.WWW.BEYONDWHOIS.COM
APPLE.COM.WAS.PWNED.BY.M1CROSOFT.COM
APPLE.COM.MORE.INFO.AT.WWW.BEYONDWHOIS.COM
APPLE.COM.IS.OWN3D.BY.NAKEDJER.COM
APPLE.COM.IS.0WN3D.BY.GULLI.COM
APPLE.COM.DENIS.DA.DOIDE.DA.PIEM.UNIX-BG.COM
APPLE.COM.BEYONDWHOIS.COM
APPLE.COM.AT.WWW.BEYONDWHOIS.COM
APPLE.COM
Upvotes: 1
Views: 1270
Reputation: 12485
Prefix your queries with an =
sign, like =example.com
instead of just the domain name and you will not see the extra results.
The long reason is that by default the whois server does a prefix search and will return every objects stored at registry whose name starts with the name you are giving in the query. And, as little known as it may be, nameservers are objects stored at registries, and long ago it was deemed funny to register useless nameservers just to "prank" innocent lookers doing queries and getting back results as yours which could be interpreted by some people not knowing the details as "Oh my god, XXXX has been hacked, see this results...", since you could basically add any word as a label (between two dots) as long as you finished by a .COM/.NET domain name you hold. Of course this has nothing to do with an attack whatsoever.
By prefixing with an equal string you force exact match instead of prefix match. Note that the purist could add that even in that case you may see two records back as you can register a nameserver whose name is the name of the domain name (which is more confusing than useful, but it is possible and it happens).
BTW for .COM domain names you should use the relevant registry whois server, which is whois.verisign-grs.com
. Ditto for other TLDs. Be aware that, depending on what you search for, you may need 2 whois queries per domain as .COM/.NET is (currently) still a thin registry. See my answer at https://unix.stackexchange.com/a/407030/211833 for further details on this point.
Upvotes: 0
Reputation: 5829
Your script ONLY queries
whois.internic.net
remember there are several domain / IP providers worldwide.
The full blown tools such as those provided in most linux distributions know to try several different servers then examine the data from all of them to determine which server is the authoritative one.
From memory I believe there are 5 worldwide authoritative zones, the internic one you already have plus:
whois.afrinic.net
whois.lacnic.net
whois.arin.net
whois.apnic.net
Ripe (The central registry we use here in europe) also have one but it's not as you would expect 'whois.ripe.net' and I've no time to look it up right now.
Now, aside from what I've said above, you may want to consider the following. Most whois authorities will throttle (or even block) your traffic if they deem that your making too many requests in a 24 hour period, instead you might want to consider logging in to the ftp site of any of the above providers and downloading the various bits of the database, then writing (or finding) your own script to process them.
I currently do that with one of my own servers, which connects using the following shell script (once every 24 hours):
#!/bin/bash
rm -f delegated-afrinic-latest
rm -f delegated-lacnic-latest
rm -f delegated-arin-latest
rm -f delegated-apnic-latest
rm -f delegated-ripencc-latest
rm -f ripe.db.inetnum
rm -f apnic.db.inetnum
rm -f ripe.db.inetnum.gz
rm -f apnic.db.inetnum.gz
wget ftp://ftp.afrinic.net/pub/stats/afrinic/delegated-afrinic-latest
wget ftp://ftp.lacnic.net/pub/stats/lacnic/delegated-lacnic-latest
wget ftp://ftp.arin.net/pub/stats/arin/delegated-arin-latest
wget ftp://ftp.apnic.net/pub/stats/apnic/delegated-apnic-latest
wget ftp://ftp.ripe.net/ripe/stats/delegated-ripencc-latest
wget ftp://ftp.ripe.net/ripe/dbase/split/ripe.db.inetnum.gz
ftp -n -v ftp.apnic.net <<END
user anonymous [email protected]
binary
passive
get /apnic/whois-data/APNIC/split/apnic.db.inetnum.gz apnic.db.inetnum.gz
bye
END
gunzip ripe.db.inetnum
gunzip apnic.db.inetnum
I then have a custom written program that parses the files out into a custom database structure which my servers then do their queries from.
Since all the servers mirror each others data, then you should be able to get a full data set from one server, but if not it wouldn't take much to modify the above shell script to download the data from the other servers, all of them respond too 'ftp.????' and have the same universal folder structure.
I can't help you with the parser however as that contains proprietary code, but the file format (esp if you get the split files) is identical to what you see in a typical whois output so it's very easy to work with.
By downloading and processing your own data like that, you get around any limit imposed by the providers, and the upshot is that it's most likely way faster to query your own data store than keep firing off requests from your server to the query servers every time someone enters an IP address.
There are many, many more whois servers than just those I listed here, however rather than list them all out on this page, this link:
https://jfreewhois.googlecode.com/git/JFreeWhois/src/uk/org/freedonia/jfreewhois/etc/serverlist.xml
will take you to an XML file that is part of a project on google code, the XML file returned will give you a pretty big list of all the whois servers available, plus a list of TLD's that each of them serve enabling you to adapt your script to talk to the correct server depending on the address inputted.
Upvotes: 2