Richard
Richard

Reputation: 61279

Convert string matrix to numpy

I have matrices of arbitrary dimension, formatted similarly to the example below. They come from an external source and the formatting cannot be changed.

[[[1.65 0.53 0 1][0.99 1.41 0 1][0.38 1.37 0 1][0 0 1 1][1.10 0.69 0 1][0 0 1 1][0.60 1.21 0 1][0.99 1.04 0 1][1.86 1.20 0 1][0 0 1 1][1.66 0.68 0 1][0.96 0.75 0 1][0.86 0.80 0 1][1.13 0.97 0 1][1.86 1.48 0 1][0 0 1 1][0.71 1.10 0 1][1.43 0.58 0 1][1.34 0.63 0 1][1.37 1.45 0 1][0.36 1.08 0 1][0 0 1 1][0.60 1.18 0 1][1.08 0.64 0 1][0.99 0.58 0 1][1.57 1.16 0 1][0.87 1.39 0 1][0.48 1.21 0 1][0 0 0 0][0 0 0 0][0 0 0 0][0 0 0 0][0 0 0 0]][[1.52 1.01 0 1][0.93 0.62 0 1][1.41 0.52 0 1][1.66 0.83 0 1][0 0 1 1][1.02 1.03 0 1][0.98 0.92 0 1][0 0 1 1][0.65 0.90 0 1][0 0 1 1][1.27 0.61 0 1][0.41 0.79 0 1][1.23 1.04 0 1][0.56 0.70 0 1][0 0 1 1][1.81 0.90 0 1][0 0 1 1][1.71 0.57 0 1][1.53 1.06 0 1][1.28 1.42 0 1][1.50 0.91 0 1][0 0 0 0][0 0 0 0][0 0 0 0][0 0 0 0][0 0 0 0][0 0 0 0][0 0 0 0][0 0 0 0][0 0 0 0][0 0 0 0][0 0 0 0][0 0 0 0]][[0 0 1 1][0.53 1.17 0 1][0.24 0.54 0 1][1.88 0.68 0 1][0 0 1 1][1.33 0.68 0 1][0.32 0.55 0 1][1.28 0.73 0 1][0.49 1.13 0 1][1.45 1.28 0 1][0.66 1.47 0 1][0 0 1 1][0.76 1.10 0 1][1.95 0.78 0 1][0 0 1 1][0.56 0.61 0 1][0.84 1.05 0 1][1.07 0.59 0 1][1.79 0.95 0 1][1.93 1.02 0 1][1.93 1.16 0 1][0 0 1 1][0.55 0.58 0 1][0.29 1.13 0 1][1.46 0.50 0 1][0 0 0 0][0 0 0 0][0 0 0 0][0 0 0 0][0 0 0 0][0 0 0 0][0 0 0 0][0 0 0 0]][[1.71 0.50 0 1][0.70 1.35 0 1][0 0 1 1][0.90 0.83 0 1][1.81 0.97 0 1][1.64 1.35 0 1][1.21 1.15 0 1][0.54 0.50 0 1][0 0 1 1][0.62 0.72 0 1][0.86 1.38 0 1][0 0 1 1][1.76 1.15 0 1][1.83 1.43 0 1][0.20 0.51 0 1][0.81 0.65 0 1][0 0 1 1][0.51 0.79 0 1][1.09 1.43 0 1][1.65 1.03 0 1][1.47 1.49 0 1][0 0 1 1][1.57 0.97 0 1][0.99 0.93 0 1][1.82 0.66 0 1][1.84 1.01 0 1][0 0 0 0][0 0 0 0][0 0 0 0][0 0 0 0][0 0 0 0][0 0 0 0][0 0 0 0]][[0 0 1 1][1.36 0.94 0 1][1.61 0.64 0 1][0.99 1.03 0 1][1.43 1.12 0 1][1.09 1.16 0 1][0.40 1.40 0 1][0 0 1 1][0.86 0.56 0 1][0.54 0.80 0 1][0.77 1.04 0 1][0 0 1 1][1.38 0.61 0 1][0.37 1.38 0 1][1.12 1.28 0 1][0 0 1 1][1.87 0.67 0 1][1.75 0.52 0 1][0.31 0.52 0 1][0.99 0.88 0 1][0 0 1 1][1.38 1.30 0 1][0 0 0 0][0 0 0 0][0 0 0 0][0 0 0 0][0 0 0 0][0 0 0 0][0 0 0 0][0 0 0 0][0 0 0 0][0 0 0 0][0 0 0 0]]]

How can I convert this into NumPy form? This answer suggests using fromstring; however, the documentation says this only works for 1D matrices.

Upvotes: 0

Views: 364

Answers (3)

jsbueno
jsbueno

Reputation: 110271

Without line breaks, that is anything but "pretty".

I think the fastest way to go there is to use regular expressions search/replace to add a , between space-separated digits and between ][. this however won't be very smart and would break on corner cases (like numbers ending in .), so you may have to fine-tune the idea:

Now, on trying it here, the old saying proves itself once more as correct: "if you have one problem that needs regular expressions, you have two problems" (unknown author).

The problem is that as some of the numbers are composed by a single digit, when this number is matched for the comma placing for the number before it, it is not available for the regex engine to match it before the number after it.

So we have to go after "look ahead matches" and "look behind matches" using the (?=...) syntax - which enables one to match just the place where we want to add the ",".

after that you have a string where you can use "eval" to have a nested list structure which can be passed directly to numpy.array.

import numpy as np
import re
from ast import literal_eval

b = re.sub(r"((?<=\d)\s+(?=\d)|(?<=\])\s*?(?=\[))", ", " , a) 
c = np.array(literal_eval(b))

Of course, if you always have one space between numbers, and no-space or breaks between "][", simple string replacing, with no regexps is much easier to go. Use the regexp if you have loose spacing in your input data.

Upvotes: 0

Richard
Richard

Reputation: 61279

I ended up with this:

import ast
import numpy as np

def StringToMatrix(txtmat):
    txtmat = txtmat.replace(" ",",").replace("][","],[")
    try:
        ret = np.array(ast.literal_eval(txtmat))
    except:
        ret = None
    return ret

Upvotes: 0

Max Kaha
Max Kaha

Reputation: 922

Depends, if it is already a list you could just use np.array() to convert it directly, if it is a string you probably need to insert , first to make it valid Python list. Here would be an example incase it is a string

import ast
import numpy as np

myString = "<YOUR STRING ABOVE>"
myString = myString.replace(" ", ",") # Replace [0 0 0 0] with [0,0,0,0]
myString = myString.replace("][", "],[") # Replace [0,0,0,0][0,0,0,0] with [0,0,0,0],[0,0,0,0] 
myList = ast.literal_eval(s) # Turn string into a list
myArr = np.array(myList) # Turn list into np_array

Hope it helps, if your data already is a list you can just skip straight to np.array(myList)

Dimensions of the array I created from the data above:

myArr.shape
(5, 33, 4)

Edit: Changed eval() to ast.literal_eval as suggested by @b_c

Upvotes: 2

Related Questions