KJJ
KJJ

Reputation: 13

Series of Strings to Arrays

I have some image arrays that I'm trying to run a regression on and somehow I'm importing the csv file as a series of strings instead of a series of arrays

In: image_train = pd.read_csv('image_train_data.csv')
In: image_train['image_array'].head()
Out: 0    [73 77 58 71 68 50 77 69 44 120 116 83 125 120...
     1    [7 5 8 7 5 8 5 4 6 7 4 7 11 5 9 11 5 9 17 11 1...
     2    [169 122 65 131 108 75 193 196 192 218 221 222...
     3    [154 179 152 159 183 157 165 189 162 174 199 1...
     4    [216 195 180 201 178 160 210 184 164 212 188 1...
     Name: image_array, dtype: object

When I try to run the regression using image_train('image_array') I get

ValueError: could not convert string to float: '[255 255 255 255 255 255 255 255...

The array is a string.

Is there a way to transform the strings to arrays for the entire series?

Upvotes: 1

Views: 79

Answers (2)

Cory Madden
Cory Madden

Reputation: 5203

While AChampion's solution looks good, I went ahead and found another solution:

image_train['image_array'].str.findall(r'\d+').apply(lambda x: map(int, x))

Which would be useful if you already had it loaded and didn't want to/couldn't load it again.

Here's another solution that works well for just evaluating a literal string representation of a list:

pd.eval(image_train['image_array'])

However, if it's separated by spaces you could do:

pd.eval(image_train['image_array'].str.replace(' ', ','))

Upvotes: 1

AChampion
AChampion

Reputation: 30288

You can use converters to describe how you want to read that field in. The easiest way would be to define your own converter to treat that column as a list, e.g.:

import ast
def conv(x):
    return ast.literal_eval(','.join(x.split(' ')))

image_train = pd.read_csv('image_train_data.csv', converters={'image_array':conv})

Upvotes: 4

Related Questions