Allyson
Allyson

Reputation: 115

How to group array of the same name using Python?

I have over a thousand array categories in a text file, for example:

Category A1 and Cateogry A2: (array in matlab code)

A1={[2,1,2]};
A1={[4,2,1,2,3]};
A2={[3,3,2,1]};
A2={[4,4,2,2]};
A2={[2,2,1,1,1]};

I would like to use Python to help me read the file and group them into:

A1=[{[2,1,2]} {[4,2,1,2,3]}];  
A2=[{[3,3,2,1]} {[4,4,2,2]} {[2,2,1,1,1]}];

Upvotes: 1

Views: 917

Answers (2)

Padraic Cunningham
Padraic Cunningham

Reputation: 180540

Use a dict to group, I presume you mean group as strings as they are not valid python containers coming from a .mat matlab file:

from collections import OrderedDict
od = OrderedDict()
with open("infile") as f:
    for line in f:
        name, data = line.split("=")
        od.setdefault(name,[]).append(data.rstrip(";\n"))

from pprint import pprint as pp
pp((od.values()))
[['{[2,1,2]}', '{[4,2,1,2,3]}'],
['{[3,3,2,1]}', '{[4,4,2,2]}', '{[2,2,1,1,1]}']]

To group the data in your file just write the content:

with open("infile", "w") as f:
    for k, v in od.items():
        f.write("{}=[{}];\n".format(k, " ".join(v))))

Output:

A1=[{[2,1,2]} {[4,2,1,2,3]}];
A2=[{[3,3,2,1]} {[4,4,2,2]} {[2,2,1,1,1]}];

Which is actually your desired output with the semicolons removed from each sub array, the elements grouped and the semicolon added to the end of the group to keep the data valid in your matlab file.

The collections.OrderedDict will keep the order from your original file where using a normal dict will have no order.

A safer approach when updating a file is to write to a temp file then replace the original file with the updated using a NamedTemporaryFile and shutil.move:

from collections import OrderedDict

od = OrderedDict()
from tempfile import NamedTemporaryFile
from shutil import move

with open("infile") as f, NamedTemporaryFile(dir=".", delete=False) as temp:
    for line in f:
        name, data = line.split("=")
        od.setdefault(name, []).append(data.rstrip("\n;"))
    for k, v in od.items():
        temp.write("{}=[{}];\n".format(k, " ".join(v)))
move(temp.name, "infile")

If the code errored in the loop or your comp crashed during the write, your original file would be preserved.

Upvotes: 4

Kasravnd
Kasravnd

Reputation: 107347

You can first loop over you lines and then split your lines with = then use ast.literal_eval and str.strip to extract the list within brackets and at last use a dictionary with a setdefault method to get your expected result :

import ast
d={}
with open('file_name') as f :
    for line in f:
        var,set_=line.split('=')
        d.setdefault(var,[]).append(ast.literal_eval(set_.strip("{}\n;")))
    print d

result :

{'A1': [[2, 1, 2], [4, 2, 1, 2, 3]], 'A2': [[3, 3, 2, 1], [4, 4, 2, 2], [2, 2, 1, 1, 1]]}

If you want the result to be exactly as your expected format you can do :

d={}
with open('ex.txt') as f,open('new','w')as out:
    for line in f:
        var,set_=line.split('=')
        d.setdefault(var,[]).append(set_.strip(";\n"))
    print d
    for i,j in d.items():
        out.write('{}=[{}];\n'.format(i,' '.join(j)))

At last you'll have the following result in new file :

A1=[{[2,1,2]} {[4,2,1,2,3]}];
A2=[{[3,3,2,1]} {[4,4,2,2]} {[2,2,1,1,1]}];

Upvotes: 3

Related Questions