Reputation: 927
This seems like it should be the most basic thing to do in python that it should be almost a default option. I have a text file that has lines such as
123, [12, 23, 45, 67]
The second array is variable in length. How do I read this in? For whatever reason I cannot find a single piece of documentation on how to deal with '[' or ']' which one might argue is the single most basic character in python.
np.loadtxt was a bust, apparently this is only for the most simple of file formats
np.genfromtxt was a bust, due to the missing columns. BTW one would like to believe the missing_value functionality could be helpful here. Would be useful to know what, if anything, the missing_value thing actually does (it is not explained clearly in the documentation at all).
I tried the np.fromstring route which gives me
['123', '[12', '23', '45', '67]']
Presumably I could parse this item by item to deal with the '[' and ']' but at this stage I have just made my own python file reader to read in a fairly basic python construct!
As for the desired output, at this stage I would settle for almost anything. The obvious construct would be line by line of the form
[123, [12, 23, 45, 67]]
Upvotes: 0
Views: 105
Reputation: 2041
The default option is eval
. It lets you evaluate Python expressions in strings. It's a security hazard though, see e.g. this question. But ast.literal_eval
should be okay. For example:
from ast import literal_eval
with open("name of file") as fh:
data = [literal_eval(line) for line in fh]
Upvotes: 1
Reputation: 231385
loadtxt
and genfromtxt
parse a line, starting with a simple split
.
In [360]: '123, [12, 23, 45, 67]'.split(',')
Out[360]: ['123', ' [12', ' 23', ' 45', ' 67]']
then they try to convert the individual strings. Some convert easily to ints or floats. The ones with [
and ]
don't. Handling those is not trivial.
The csv
reader that comes with Python can handle quoted text, e.g.
`one, "twenty, three", four'
I have not played with it enough to know whether it can treat []
as quotes or not.
Your bracketed text is easier to parse if you use different delimiters inside the brackets, eg
In [371]: l1='123; [12, 23, 45, 67]'.split(';')
In [372]: l1
Out[372]: ['123', ' [12, 23, 45, 67]']
In [373]: l2=l1[1].strip().strip(']').strip('[').split(',')
In [374]: l2
Out[374]: ['12', ' 23', ' 45', ' 67']
As Warren commented, plain CSV is something of an industry standard, and used in many languages. The use of brackets and such has not been standardized. But there are data exchange languages like XML, JSON and yaml, as well as non-text data files (e.g. HD5F
).
JSON
example:
In [377]: json.loads('[123, [12, 23, 45, 67]]')
Out[377]: [123, [12, 23, 45, 67]]
Upvotes: 2