banikr
banikr

Reputation: 75

Reading items from .txt in specific order

I'm trying to read items from a .txt file that has the following:

294.nii.gz [[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]]
296.nii.gz [[10, 13, 62], [40, 1, 64], [34, 0, 49], [27, 0, 49]]
312.nii.gz [[0, 27, 57], [25, 25, 63], [0, 42, 38], [0, 11, 21]]

The way I want to extract the data is:

  1. Get the item name: 294.nii.gz
  2. Item's coordinates serially: [9, 46, 54] [36, 48, 44] ...
  3. Get the next item:

N.B. all the items have the same number of 3D coordinates.

So far I can read the data by following codes:

coortxt = os.path.join(coordir, 'coor_downsampled.txt')
with open(coortxt) as f:
    content = f.readlines()
content = [x.strip() for x in content]

for item in content:
    print(item.split(' ')[0])

This only prints the item names:

294.nii.gz
296.nii.gz
312.nii.gz

How do I get the rest of the data in the format I need?

Upvotes: 0

Views: 76

Answers (2)

dcronqvist
dcronqvist

Reputation: 81

Others have suggested using the dynamic evaluator eval in Python (and even the ast.literal_eval, which definitely works, but there are still ways to perform this kind of parsing without that.

Given that the formatting of the coordinate list in the coor_downsampled.txt file is very json-esque, we can parse it using the very cool json module instead.

NOTE:

There are sources claiming that json.loads is 4x faster than eval, and almost 7x faster than ast.literal_eval, which depending on if you are in the need for speed, I'd recommend using the faster option.

Complete example

import os
import json

coortxt = 'coor_downsampled.txt'
with open(coortxt) as f:
    content = f.readlines()
content = [x.strip() for x in content]

for item in content:
    # split the line just like you did in your own example
    split_line = item.split(" ")

    # the "name" is the first element
    name = split_line[0]

    # here's the tricky part.
    coords = json.loads("".join(split_line[1:]))
    print(name)
    print(coords)

Explanation

Let's break down this tricky line coords = json.loads("".join(split_line[1:]))

split_line[1:] will give you everything past the first space, so something like this:

['[[9,', '46,', '54],', '[36,', '48,', '44],', '[24,', '19,', '46],', '[15,', '0,', '22]]']

But by wrapping it with a "".join(), we can turn it into

'[[9,46,54],[36,48,44],[24,19,46],[15,0,22]]' as a string instead.

Once we have it like that, we simply do json.loads() to get the actual list object

[[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]].

Upvotes: 1

blackbrandt
blackbrandt

Reputation: 2095

So you have the fun task of converting a string representation of a list to a list.

To do this, you'll can use the ast library. Specifically, the ast.literal_eval method.

Disclaimer:

According to documentation:

Warning It is possible to crash the Python interpreter with a sufficiently large/complex string due to stack depth limitations in Python’s AST compiler.

This is NOT the same as using eval. From the docs:

Safely evaluate an expression node or a string containing a Python expression. The string or node provided may only consist of the following Python literal structures: strings, numbers, tuples, lists, dicts, booleans, and None.

This can be used for safely evaluating strings containing Python expressions from untrusted sources without the need to parse the values oneself.

You get the first part of the data with item.split(' ')[0].

Then, you'll use item.split(' ')[1:] to get (for example) a string with contents "[[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]]".

If this is a risk you're willing to accept:

A demonstration using ast:

import ast
list_str = "[[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]]"
list_list = ast.literal_eval(list_str)
print(isinstance(list_list, list))
#Outputs True
print(list_list)
#Outputs [[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]]

Tying it together with your code:

import os 
import ast

coortxt = os.path.join(coordir, 'coor_downsampled.txt')
with open(coortxt) as f:
    content = f.readlines()
content = [x.strip() for x in content]

for item in content:
    name,coords_str = item.split(' ')[0], item.split(' ')[1:]
    coords = ast.literal_eval(coords_str)
    #name,coords now contain your required data
    #use as needed


Relevant posts:

https://stackoverflow.com/a/10775909/5763413

How to convert string representation of list to a list?

Upvotes: 2

Related Questions