Reputation: 876
I'm trying to replicate the format of an existing data file which has the following class structure when loaded with np.load
:
<class 'numpy.ndarray'>
<class 'list'>
<class 'list'>
<class 'numpy.str_'>
It is a ndarray with lists of lists of strings.
I'm using the following code to create the same structure, a list of lists of lists of strings and trying to convert the outermost list into a ndarray without also converting the inner lists into ndarrays.
captions = []
for row in attrs.iterrows():
sorted_row = row[1].sort_values(ascending=False)
attributes, variations = [], []
for col, val in sorted_row[:20].iteritems():
attributes.append([x[1] for x in word2Id if x[0] == col][0])
variations.append(attributes)
for i in range(9):
variations.append(random.sample(attributes, len(attributes)))
captions.append(variations)
np.save('train_captions.npy', captions)
When I open the resulting npy
file, the class hierarchy is like this:
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.str_'>
How can I store captions
in the code above so that it has the same structure as the file at the very top.
Upvotes: 0
Views: 4305
Reputation: 71
import numpy as np
list = ["a", "b", "c", "d"]
np.save('list.npy', list)
read_list = np.load('list.npy').tolist()
print(read_list, type(read_list))
>>>['a', 'b', 'c', 'd'] <class 'list'>
If we don't use .tolist()
the result is:
['a' 'b' 'c' 'd'] <class 'numpy.ndarray'>
Upvotes: 2
Reputation: 231355
When I try to replicate your code (more or less):
In [273]: captions = []
In [274]: for r in range(2):
...: attributes, variations = [], []
...: for c in range(2):
...: attributes.append([i for i in ['a','b','c']])
...: variations.append(attributes)
...: for i in range(2):
...: variations.append(random.sample(attributes, len(attributes)))
...: captions.append(variations)
...:
In [275]: captions
Out[275]:
[[[['a', 'b', 'c'], ['a', 'b', 'c']],
[['a', 'b', 'c'], ['a', 'b', 'c']],
[['a', 'b', 'c'], ['a', 'b', 'c']]],
[[['a', 'b', 'c'], ['a', 'b', 'c']],
[['a', 'b', 'c'], ['a', 'b', 'c']],
[['a', 'b', 'c'], ['a', 'b', 'c']]]]
The list has several levels of nesting. When passed to np.array
, the result is a 4d array of strings:
In [276]: arr = np.array(captions)
In [277]: arr.shape
Out[277]: (2, 3, 2, 3)
In [278]: arr.dtype
Out[278]: dtype('<U1')
Where possible np.array
tries to make as high dimensional array as it can.
To make an array of lists, we have to do something like:
In [279]: arr = np.empty(2, dtype=object)
In [280]: arr[0] = captions[0]
In [281]: arr[1] = captions[1]
In [282]: arr
Out[282]:
array([list([[['a', 'b', 'c'], ['a', 'b', 'c']], [['a', 'b', 'c'], ['a', 'b', 'c']], [['a', 'b', 'c'], ['a', 'b', 'c']]]),
list([[['a', 'b', 'c'], ['a', 'b', 'c']], [['a', 'b', 'c'], ['a', 'b', 'c']], [['a', 'b', 'c'], ['a', 'b', 'c']]])],
dtype=object)
Upvotes: 1