Reputation: 437
I have a .txt file in the following format:
AM|75019|Caribbean from 15N to 18N between 80W and 85W|18.757950|-81.741300
AM|75021|Caribbean from 15N to 18N between 72W and 80W|18.757950|-81.741300
AM|75015|Caribbean approaches to the Windward Passage|15.133340|-68.139050
I want to extract only the first two columns to use as key:value
. For example, AM:75019
, AM:75021
, etc. I'm new to python (using 2.6) and am not sure how to do this. I've searched and found multiple answers that don't entirely make sense since there are multiple columns.
Upvotes: 1
Views: 5345
Reputation: 123501
Dictionaries in Python cannot have duplicate keys, so the closest thing you could do would be to store a list of values associated with each key.
Your file is composed of character separated values, so using Python's csv
module would make parsing the file into separate fields trivial.
Here's one way to accomplish what you want. Note that you could also use the collections.defaultdict
class, which was added to Python v2.5, instead of defining one of your own as shown below:
import csv
from pprint import pprint
class ListDict(dict):
""" Dictionary who's values are lists. """
def __missing__(self, key):
value = self[key] = []
return value
filename = 'multi_col.csv'
lstdct = ListDict()
with open(filename, 'rb') as csvfile:
for row in csv.reader(csvfile, delimiter='|'):
key, value = row[:2]
lstdct[key].append(value)
pprint(lstdct) # -> {'AM': ['75019', '75021', '75015']}
Upvotes: 1
Reputation: 8400
I want to extract only the first to columns to use as key:value. For example, AM:75019, AM:75021, etc.....
If a key is duplicated in dict the second key-value pair will overwrite the first as a dictionary can only have one value per key.
If want values with same keys you can have look at defaultdict
Here is sample code,
In [1]: from collections import defaultdict
In [2]: lines = tuple(open('test.txt', 'r'))
In [3]: data_dict = defaultdict(list)
In [4]: for line in lines:
...: data_dict[line.split('|')[0]].append(line.split('|')[1])
...:
In [5]: data_dict
Out[5]: defaultdict(list, {'AM': ['75019', '75021', '75015']})
In [6]:
Upvotes: 0
Reputation: 169
Follow the following steps and get the expected response as output array
Add file.txt to the project structure - Add below code in the new extractinfo.py - Execute it
f = open('file.txt', 'r')
content = f.read()
allLines = content.split('\n')
output = []
for singleLine in allLines:
singleLine = singleLine.split('|')
extractedJSON = {}
extractedJSON[singleLine[0]] = singleLine[1]
output.append(extractedJSON)
print "output"
print output
f.close()
I have attached the image of running code.
Upvotes: 1
Reputation: 1619
Following code will make what you want:
with open('somefile.txt', 'r') as f:
d = {line.split('|')[0]: line.split('|')[1] for line in f}
Upvotes: 0
Reputation: 1
you may want to use the split function.
Using the '|' separator you will obtain for every line several tokens. For your purpose you only need to use the first two.
Here is a small snippet
ze_dict = {}
ze_file = open(my_file_path, 'r')
ze_lines = ze_file.read().splitlines()
for l in ze_lines:
ze_tokens = l.split('|')
ze_dict[ze_tokens[0]] = ze_tokens[1]
ze_file.close()
Ofc you may add error control in this snippet !
Please note this is note the most pythonic way to do this (see other answers)
Upvotes: 0