nad
nad

Reputation: 2860

reading multiple s3 objects in a numpy array and concatenate

I have multiple objects in a s3 bucket (part files). I need to read them and concatenate to one single numpy array. I am using below code

def read_and_concat(bucket, key_list):
    length = len(key_list)
    for index, key in enumerate(key_list):
        s3_client.download_file(bucket, key, 'test.out')
        target_data = genfromtxt('test.out', delimiter=',')
        data_shape = target_data.shape
        data[index] = np.array(data_shape)
        data[index] = target_data
    result = np.concatenate([data[i] for i in range(length)])
    return result

This throws me error NameError: name 'data' is not defined. I guess I need to define data as a 2D numpy array before using it in data[index] = np.array(data_shape) line. But I am not sure how.

Or is there any other thing I am missing?

Please suggest.

Upvotes: 0

Views: 192

Answers (1)

MBeale
MBeale

Reputation: 750

I think that data needs to be defined before you use it in this case. Assigning by index to a variable that doesn't exist throws a NameError. I'm not sure the extra step of creating the array is needed because genfromtext returns an ndarray.

def read_and_concat(bucket, key_list):
    length = len(key_list)
    data = []
    for index, key in enumerate(key_list):
        s3_client.download_file(bucket, key, 'test.out')
        data.append(genfromtxt('test.out', delimiter=','))
    return np.concatenate(data)

Upvotes: 1

Related Questions