thefragileomen
thefragileomen

Reputation: 1547

Process .txt file into dictionary (Python v2.7)

I am currently looking to process and parse out information from this .txt file. The file appears to be tab delimited. I am looking to parse out the base 16 value (ie. 000000) as the dictionary key and the company name (ie. Xerox Corporation) as the dictionary value. So, if for example I look up in my dictionary the key 000001, Xerox Corporation would be returned as the respective value.

I've tried parsing the .txt file as a csv reading the entry on every nth line but unfortunately there is no pattern and the nth number varies.

Is there any way to capture the value preceeding the term "base 16" for example and then the term that follows it to make a dictionary entry?

Many thanks

Upvotes: 1

Views: 1212

Answers (4)

orlp
orlp

Reputation: 117661

Well entries are seperated with two newlines. The second line always is the base16 one. The data before the first tab is the base16 key and the last is the company name.

import urllib

inputfile = urllib.urlopen("http://standards.ieee.org/develop/regauth/oui/oui.txt")
data = inputfile.read()

entries = data.split("\n\n")[1:-1] #ignore first and last entries, they're not real entries

d = {}
for entry in entries:
    parts = entry.split("\n")[1].split("\t")
    company_id = parts[0].split()[0]
    company_name = parts[-1]
    d[company_id] = company_name

Some of the results:

40F52E: Leica Microsystems (Schweiz) AG
3831AC: WEG
00B0F0: CALY NETWORKS
9CC077: PrintCounts, LLC
000099: MTX, INC.
000098: CROSSCOMM CORPORATION
000095: SONY TEKTRONIX CORP.
000094: ASANTE TECHNOLOGIES
000097: EMC Corporation
000096: MARCONI ELECTRONICS LTD.
000091: ANRITSU CORPORATION
000090: MICROCOM
000093: PROTEON INC.
000092: COGENT DATA TECHNOLOGIES
002192: Baoding Galaxy Electronic Technology  Co.,Ltd
90004E: Hon Hai Precision Ind. Co.,Ltd.
002193: Videofon MV
00A0D4: RADIOLAN,  INC.
E0F379: Vaddio
002190: Goliath Solutions

Upvotes: 1

Steven Rumbalski
Steven Rumbalski

Reputation: 45542

>>> import urllib
... 
... f = urllib.urlopen('http://standards.ieee.org/develop/regauth/oui/oui.txt')
... d = dict([(s[:6], s[22:].strip()) for s in f if 'base 16' in s])
... print d['000001']
XEROX CORPORATION

Upvotes: 1

phihag
phihag

Reputation: 287775

def oui_parse(fn='oui.txt'):
    with open(fn) as ouif:
        content = ouif.read()
    for block in content.split('\n\n'):
        lines = block.split('\n')

        if not lines or not '(hex)' in lines[0]: # First block
            continue

        assert '(base 16)' in lines[1]
        d = {}
            d['oui'] = lines[1].split()[0]
        d['company'] = lines[1].split('\t')[-1]
        if len(lines) == 6:
            d['division'] = lines[2].strip()
        d['street'] = lines[-3].strip()
        d['city'] = lines[-2].strip()
        d['country'] = lines[-1].strip()
        yield d

oui_info = list(oui_parse())

Upvotes: 1

dugres
dugres

Reputation: 13085

result = dict()
for lig in open('oui.txt'):
    if 'base 16' in lig:
        num, sep, txt = lig.strip().partition('(base 16)')
        result.[num.strip()] = txt.strip()

Upvotes: 1

Related Questions