Reputation: 1281
I have a text file containing 10 columns of numbers. What I would like to be able to do is to create a dictionary in which the first three numbers (of the 10 per line) of each row of the data can be used as a key to access two further numbers in columns 6 and 7 (in the same line). I have been trying to do this using the numpy.loadtext (in Python 2.7) function however I am running into difficulties with the dtype argument? Is this the correct approach or is there a simpler way, and if so, what is the correct way to lay out the function.
Many thanks and please let me know if any clarification is required
Upvotes: 2
Views: 634
Reputation: 879093
Given column-spaced the format of your data,
1 0 0 617.09 0.00 9.38 l 0.0000E+00
2 0 0 7169.00 6978.44 94.10 o 0.1913E-05
3 0 0 366.08 371.91 14.06 o 0.6503E-03
4 0 0 5948.04 5586.09 52.95 o 0.2804E-05
5 0 0 3756.34 3944.63 50.69 o 0.6960E-05
-11 1 0 147.27 93.02 23.25 o 0.1320E-02
-10 1 0 -2.31 5.71 9.57 o 0.2533E-02
I think it would be easiest to just use Python string manipulation tools like split
to parse the file:
def to_float(item):
try:
return float(item)
except ValueError:
return item
def formatter(lines):
for line in lines:
if not line.strip(): continue
yield [to_float(item) for item in line.split()]
dct = {}
with open('data') as f:
for row in formatter(f):
dct[tuple(row[:3])] = row[5:7]
print(dct)
yields
{(-11.0, 1.0, 0.0): [23.25, 'o'], (4.0, 0.0, 0.0): [52.95, 'o'], (1.0, 0.0, 0.0): [9.38, 'l'], (-10.0, 1.0, 0.0): [9.57, 'o'], (3.0, 0.0, 0.0): [14.06, 'o'], (5.0, 0.0, 0.0): [50.69, 'o'], (2.0, 0.0, 0.0): [94.1, 'o']}
Original answer:
genfromtxt
has a parameter dtype
, which when set to None
causes genfromtxt
to try to guess the appropriate dtype
:
import numpy as np
arr = np.genfromtxt('data', dtype = None)
dct = {tuple(row[:3]):row[5:7] for row in arr}
For example, with data
like this:
1 2 3 4 5 6 7 8 9 10
1 2 4 4 5 6 7 8 9 10
1 2 5 4 5 6 7 8 9 10
dct
gets set to
{(1, 2, 5): array([6, 7]), (1, 2, 4): array([6, 7]), (1, 2, 3): array([6, 7])}
Upvotes: 1
Reputation: 358
For clarity, a complete example of the above (correct) answer might look like:
import numpy as np
f = open("data.txt", 'wa')
f.write("1 2 3 4 5 6 7 8 9 10\n")
f.write("1 2 4 4 5 6 7 8 9 10\n")
f.write("1 2 5 4 5 6 7 8 9 10\n")
f.close()
arr = np.genfromtxt("data.txt", dtype=None)
dct = {tuple(row[:3]):row[4:6] for row in arr}
Which would result in:
{(1, 2, 3): array([5, 6]), (1, 2, 4): array([5, 6]), (1, 2, 5): array([5, 6])}
It may be apparent, but NB: you will overwrite dictionary entries when you have identical elements in the first three columns of more than one row.
Upvotes: 1