Reputation: 2860
I have multiple objects in a s3 bucket (part files). I need to read them and concatenate to one single numpy array. I am using below code
def read_and_concat(bucket, key_list):
length = len(key_list)
for index, key in enumerate(key_list):
s3_client.download_file(bucket, key, 'test.out')
target_data = genfromtxt('test.out', delimiter=',')
data_shape = target_data.shape
data[index] = np.array(data_shape)
data[index] = target_data
result = np.concatenate([data[i] for i in range(length)])
return result
This throws me error NameError: name 'data' is not defined
. I guess I need to define data
as a 2D numpy array before using it in data[index] = np.array(data_shape)
line. But I am not sure how.
Or is there any other thing I am missing?
Please suggest.
Upvotes: 0
Views: 192
Reputation: 750
I think that data
needs to be defined before you use it in this case. Assigning by index to a variable that doesn't exist throws a NameError
. I'm not sure the extra step of creating the array is needed because genfromtext
returns an ndarray.
def read_and_concat(bucket, key_list):
length = len(key_list)
data = []
for index, key in enumerate(key_list):
s3_client.download_file(bucket, key, 'test.out')
data.append(genfromtxt('test.out', delimiter=','))
return np.concatenate(data)
Upvotes: 1