Reputation: 10697
let's say I have a file that looks like this
text a
bla bla
1 2 3
4 5 6
text b
bla
7 8 9
10 11 12
text c
bla bla bla
13 14 15
16 17 18
I am trying to extract only the number arrays and place them into a numpy
array:
array([[ 1, 2, 3,
4, 5, 6,],
[ 7, 8, 9,
10, 11, 12],
[ 13, 14, 15,
16, 17, 18]])
I tried using np.genfromtxt('test.txt',usecols=[0,1,2],invalid_raise=False)
array([[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.],
[ 10., 11., 12.],
[ nan, nan, nan],
[ 13., 14., 15.],
[ 16., 17., 18.]])
but it doesn't create sub-arrays and converts the text into nans
. Is there a better way of doing this?
Upvotes: 1
Views: 204
Reputation: 53089
You could use itertools.groupby
along the lines of
>>> import itertools
>>> import numpy as np
>>>
>>> content = """text a
...
... bla bla
...
... 1 2 3
... 4 5 6
...
... text b
...
... bla
...
... 7 8 9
... 10 11 12
...
... text c
...
... bla bla bla
...
... 13 14 15
... 16 17 18"""
>>>
>>> import io
>>> filelike = io.StringIO(content)
# you may want to refine this test
>>> allowed_characters = set('0123456789 ')
>>> def isnumeric(line):
... return set() < set(line.strip()) <= allowed_characters
...
>>> [np.genfromtxt(gr) for k, gr in itertools.groupby(filelike, isnumeric) if k]
[array([[1., 2., 3.],
[4., 5., 6.]]), array([[ 7., 8., 9.],
[10., 11., 12.]]), array([[13., 14., 15.],
[16., 17., 18.]])]
Upvotes: 1
Reputation: 21
You'll likely have to resort to a bit of "manual" parsing. Assuming a form like given here's one solution (there are surely others):
import numpy as np
def parser(fname):
with open(fname) as fh:
for i, line in enumerate(fh):
p = i % 7
if p not in (5, 6):
continue
yield line.rstrip()
a = ' '.join(parser(filename))
arr = np.fromstring(a, dtype=int, sep=' ')
arr = arr.reshape((-1, 6))
print(arr)
Upvotes: 0