nunodsousa
nunodsousa

Reputation: 2785

Save pandas dataframe with numpy arrays column

Let us consider the following pandas dataframe:

df = pd.DataFrame([[1,np.array([6,7])],[4,np.array([8,9])]], columns = {'A','B'})

enter image description here

where the B column is composed by two numpy arrays.

If we save the dataframe and the load it again, the numpy array is converted into a string.

df.to_csv('test.csv', index = False)
df.read_csv('test.csv')

Is there any simple way of solve this problem? Here is the output of the loaded dataframe.

enter image description here

Upvotes: 18

Views: 16566

Answers (2)

Abhik Sarkar
Abhik Sarkar

Reputation: 965

Use the following function to format each row.

def formatting(string_numpy):
"""formatting : Conversion of String List to List

Args:
    string_numpy (str)
Returns:
    l (list): list of values
"""
list_values = string_numpy.split(", ")
list_values[0] = list_values[0][2:]
list_values[-1] = list_values[-1][:-2]
return list_values

Then use the following apply function to convert it back into numpy arrays.

df[col] = df.col.apply(formatting)

Upvotes: 0

usernamenotfound
usernamenotfound

Reputation: 1580

you can pickle the data instead.

df.to_pickle('test.csv')
df = pd.read_pickle('test.csv')

This will ensure that the format remains the same. However, it is not human readable

If human readability is an issue, I would recommend converting it to a json file

df.to_json('abc.json')
df = pd.read_json('abc.json')

Upvotes: 21

Related Questions