klex52s
klex52s

Reputation: 437

Convert txt file with multiple columns to dictionary

I have a .txt file in the following format:

AM|75019|Caribbean from 15N to 18N between 80W and 85W|18.757950|-81.741300
AM|75021|Caribbean from 15N to 18N between 72W and 80W|18.757950|-81.741300
AM|75015|Caribbean approaches to the Windward Passage|15.133340|-68.139050

I want to extract only the first two columns to use as key:value. For example, AM:75019, AM:75021, etc. I'm new to python (using 2.6) and am not sure how to do this. I've searched and found multiple answers that don't entirely make sense since there are multiple columns.

Upvotes: 1

Views: 5345

Answers (5)

martineau
martineau

Reputation: 123501

Dictionaries in Python cannot have duplicate keys, so the closest thing you could do would be to store a list of values associated with each key.

Your file is composed of character separated values, so using Python's csv module would make parsing the file into separate fields trivial.

Here's one way to accomplish what you want. Note that you could also use the collections.defaultdict class, which was added to Python v2.5, instead of defining one of your own as shown below:

import csv
from pprint import pprint

class ListDict(dict):
    """ Dictionary who's values are lists. """
    def __missing__(self, key):
        value = self[key] = []
        return value

filename = 'multi_col.csv'

lstdct = ListDict()
with open(filename, 'rb') as csvfile:
    for row in csv.reader(csvfile, delimiter='|'):
        key, value = row[:2]
        lstdct[key].append(value)

pprint(lstdct)  # -> {'AM': ['75019', '75021', '75015']}

Upvotes: 1

Nishant Nawarkhede
Nishant Nawarkhede

Reputation: 8400

I want to extract only the first to columns to use as key:value. For example, AM:75019, AM:75021, etc.....

If a key is duplicated in dict the second key-value pair will overwrite the first as a dictionary can only have one value per key.

If want values with same keys you can have look at defaultdict

Here is sample code,

In [1]: from collections import defaultdict

In [2]: lines = tuple(open('test.txt', 'r'))

In [3]: data_dict = defaultdict(list)

In [4]: for line in lines:
   ...:     data_dict[line.split('|')[0]].append(line.split('|')[1])
   ...:

In [5]: data_dict
Out[5]: defaultdict(list, {'AM': ['75019', '75021', '75015']})

In [6]:

Upvotes: 0

Nipun Madan
Nipun Madan

Reputation: 169

Follow the following steps and get the expected response as output array

Add file.txt to the project structure - Add below code in the new extractinfo.py - Execute it

f = open('file.txt', 'r')
content = f.read()
allLines = content.split('\n')
output = []
for singleLine in allLines:
    singleLine = singleLine.split('|')
    extractedJSON = {}
    extractedJSON[singleLine[0]] = singleLine[1]
    output.append(extractedJSON)
print "output"
print output
f.close()

I have attached the image of running code.Image Showing running code

Upvotes: 1

Jakub Bláha
Jakub Bláha

Reputation: 1619

Following code will make what you want:

with open('somefile.txt', 'r') as f:
    d = {line.split('|')[0]: line.split('|')[1] for line in f}

Upvotes: 0

Paronax
Paronax

Reputation: 1

you may want to use the split function.

Using the '|' separator you will obtain for every line several tokens. For your purpose you only need to use the first two.

Here is a small snippet

ze_dict = {}
ze_file = open(my_file_path, 'r')
ze_lines = ze_file.read().splitlines()
for l in ze_lines:
    ze_tokens = l.split('|')
    ze_dict[ze_tokens[0]] = ze_tokens[1]
ze_file.close()

Ofc you may add error control in this snippet !

Please note this is note the most pythonic way to do this (see other answers)

Upvotes: 0

Related Questions