Reputation: 75
I'm trying to read items from a .txt file that has the following:
294.nii.gz [[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]]
296.nii.gz [[10, 13, 62], [40, 1, 64], [34, 0, 49], [27, 0, 49]]
312.nii.gz [[0, 27, 57], [25, 25, 63], [0, 42, 38], [0, 11, 21]]
The way I want to extract the data is:
294.nii.gz
[9, 46, 54]
[36, 48, 44]
...N.B. all the items have the same number of 3D coordinates.
So far I can read the data by following codes:
coortxt = os.path.join(coordir, 'coor_downsampled.txt')
with open(coortxt) as f:
content = f.readlines()
content = [x.strip() for x in content]
for item in content:
print(item.split(' ')[0])
This only prints the item names:
294.nii.gz
296.nii.gz
312.nii.gz
How do I get the rest of the data in the format I need?
Upvotes: 0
Views: 76
Reputation: 81
Others have suggested using the dynamic evaluator eval
in Python (and even the ast.literal_eval
, which definitely works, but there are still ways to perform this kind of parsing without that.
Given that the formatting of the coordinate list in the coor_downsampled.txt
file is very json-esque, we can parse it using the very cool json
module instead.
There are sources claiming that json.loads is 4x faster than eval, and almost 7x faster than ast.literal_eval, which depending on if you are in the need for speed, I'd recommend using the faster option.
import os
import json
coortxt = 'coor_downsampled.txt'
with open(coortxt) as f:
content = f.readlines()
content = [x.strip() for x in content]
for item in content:
# split the line just like you did in your own example
split_line = item.split(" ")
# the "name" is the first element
name = split_line[0]
# here's the tricky part.
coords = json.loads("".join(split_line[1:]))
print(name)
print(coords)
Let's break down this tricky line coords = json.loads("".join(split_line[1:]))
split_line[1:]
will give you everything past the first space, so something like this:
['[[9,', '46,', '54],', '[36,', '48,', '44],', '[24,', '19,', '46],', '[15,', '0,', '22]]']
But by wrapping it with a "".join()
, we can turn it into
'[[9,46,54],[36,48,44],[24,19,46],[15,0,22]]'
as a string instead.
Once we have it like that, we simply do json.loads()
to get the actual list object
[[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]]
.
Upvotes: 1
Reputation: 2095
So you have the fun task of converting a string representation of a list to a list.
To do this, you'll can use the ast library. Specifically, the ast.literal_eval
method.
According to documentation:
Warning It is possible to crash the Python interpreter with a sufficiently large/complex string due to stack depth limitations in Python’s AST compiler.
This is NOT the same as using eval
. From the docs:
Safely evaluate an expression node or a string containing a Python expression. The string or node provided may only consist of the following Python literal structures: strings, numbers, tuples, lists, dicts, booleans, and None.
This can be used for safely evaluating strings containing Python expressions from untrusted sources without the need to parse the values oneself.
You get the first part of the data with item.split(' ')[0]
.
Then, you'll use item.split(' ')[1:]
to get (for example) a string with contents "[[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]]"
.
If this is a risk you're willing to accept:
A demonstration using ast
:
import ast
list_str = "[[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]]"
list_list = ast.literal_eval(list_str)
print(isinstance(list_list, list))
#Outputs True
print(list_list)
#Outputs [[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]]
Tying it together with your code:
import os
import ast
coortxt = os.path.join(coordir, 'coor_downsampled.txt')
with open(coortxt) as f:
content = f.readlines()
content = [x.strip() for x in content]
for item in content:
name,coords_str = item.split(' ')[0], item.split(' ')[1:]
coords = ast.literal_eval(coords_str)
#name,coords now contain your required data
#use as needed
https://stackoverflow.com/a/10775909/5763413
How to convert string representation of list to a list?
Upvotes: 2