How to read python data types from a file

Question

This seems like it should be the most basic thing to do in python that it should be almost a default option. I have a text file that has lines such as

123, [12, 23, 45, 67]

The second array is variable in length. How do I read this in? For whatever reason I cannot find a single piece of documentation on how to deal with '[' or ']' which one might argue is the single most basic character in python.

np.loadtxt was a bust, apparently this is only for the most simple of file formats

np.genfromtxt was a bust, due to the missing columns. BTW one would like to believe the missing_value functionality could be helpful here. Would be useful to know what, if anything, the missing_value thing actually does (it is not explained clearly in the documentation at all).

I tried the np.fromstring route which gives me

['123', '[12', '23', '45', '67]']

Presumably I could parse this item by item to deal with the '[' and ']' but at this stage I have just made my own python file reader to read in a fairly basic python construct!

As for the desired output, at this stage I would settle for almost anything. The obvious construct would be line by line of the form

[123, [12, 23, 45, 67]]

hpaulj · Accepted Answer

loadtxt and genfromtxt parse a line, starting with a simple split.

In [360]: '123, [12, 23, 45, 67]'.split(',')
Out[360]: ['123', ' [12', ' 23', ' 45', ' 67]']

then they try to convert the individual strings. Some convert easily to ints or floats. The ones with [ and ] don't. Handling those is not trivial.

The csv reader that comes with Python can handle quoted text, e.g.

 `one, "twenty, three", four'

I have not played with it enough to know whether it can treat [] as quotes or not.

Your bracketed text is easier to parse if you use different delimiters inside the brackets, eg

In [371]: l1='123; [12, 23, 45, 67]'.split(';')
In [372]: l1
Out[372]: ['123', ' [12, 23, 45, 67]']
In [373]: l2=l1[1].strip().strip(']').strip('[').split(',')
In [374]: l2
Out[374]: ['12', ' 23', ' 45', ' 67']

As Warren commented, plain CSV is something of an industry standard, and used in many languages. The use of brackets and such has not been standardized. But there are data exchange languages like XML, JSON and yaml, as well as non-text data files (e.g. HD5F).

JSON example:

In [377]: json.loads('[123, [12, 23, 45, 67]]')
Out[377]: [123, [12, 23, 45, 67]]

How to read python data types from a file

Answers (2)

Related Questions