Reputation: 1547
I am currently looking to process and parse out information from this .txt file. The file appears to be tab delimited. I am looking to parse out the base 16 value (ie. 000000) as the dictionary key and the company name (ie. Xerox Corporation) as the dictionary value. So, if for example I look up in my dictionary the key 000001, Xerox Corporation would be returned as the respective value.
I've tried parsing the .txt file as a csv reading the entry on every nth line but unfortunately there is no pattern and the nth number varies.
Is there any way to capture the value preceeding the term "base 16" for example and then the term that follows it to make a dictionary entry?
Many thanks
Upvotes: 1
Views: 1212
Reputation: 117661
Well entries are seperated with two newlines. The second line always is the base16 one. The data before the first tab is the base16 key and the last is the company name.
import urllib
inputfile = urllib.urlopen("http://standards.ieee.org/develop/regauth/oui/oui.txt")
data = inputfile.read()
entries = data.split("\n\n")[1:-1] #ignore first and last entries, they're not real entries
d = {}
for entry in entries:
parts = entry.split("\n")[1].split("\t")
company_id = parts[0].split()[0]
company_name = parts[-1]
d[company_id] = company_name
Some of the results:
40F52E: Leica Microsystems (Schweiz) AG
3831AC: WEG
00B0F0: CALY NETWORKS
9CC077: PrintCounts, LLC
000099: MTX, INC.
000098: CROSSCOMM CORPORATION
000095: SONY TEKTRONIX CORP.
000094: ASANTE TECHNOLOGIES
000097: EMC Corporation
000096: MARCONI ELECTRONICS LTD.
000091: ANRITSU CORPORATION
000090: MICROCOM
000093: PROTEON INC.
000092: COGENT DATA TECHNOLOGIES
002192: Baoding Galaxy Electronic Technology Co.,Ltd
90004E: Hon Hai Precision Ind. Co.,Ltd.
002193: Videofon MV
00A0D4: RADIOLAN, INC.
E0F379: Vaddio
002190: Goliath Solutions
Upvotes: 1
Reputation: 45542
>>> import urllib
...
... f = urllib.urlopen('http://standards.ieee.org/develop/regauth/oui/oui.txt')
... d = dict([(s[:6], s[22:].strip()) for s in f if 'base 16' in s])
... print d['000001']
XEROX CORPORATION
Upvotes: 1
Reputation: 287775
def oui_parse(fn='oui.txt'):
with open(fn) as ouif:
content = ouif.read()
for block in content.split('\n\n'):
lines = block.split('\n')
if not lines or not '(hex)' in lines[0]: # First block
continue
assert '(base 16)' in lines[1]
d = {}
d['oui'] = lines[1].split()[0]
d['company'] = lines[1].split('\t')[-1]
if len(lines) == 6:
d['division'] = lines[2].strip()
d['street'] = lines[-3].strip()
d['city'] = lines[-2].strip()
d['country'] = lines[-1].strip()
yield d
oui_info = list(oui_parse())
Upvotes: 1
Reputation: 13085
result = dict()
for lig in open('oui.txt'):
if 'base 16' in lig:
num, sep, txt = lig.strip().partition('(base 16)')
result.[num.strip()] = txt.strip()
Upvotes: 1