Reputation: 47
How do I put a numpy array into an element (single cell) of a Pandas DataFrame? For instance,
Driver Make Model Coordinates
Bob Ford Focus [[1, 0, 1],[1, 2, 3], [2, 0, 2]]
Sally Ford Echo [[0, 0, 1],[0, 2, 0]]
I've tried to store the array on each row, but the documentation doesn't seem to support this.
Context:
I am hoping to use df.to_json()
to export the data to a json file, from which the data can later be read into a DataFrame where each row is one of the individuals. Should I be thinking about doing this differently?
Upvotes: 0
Views: 1529
Reputation: 4648
Yes, you can. Use .at[]
or .iat[]
to avoid broadcasting behavior when attempting to put an iterable into a single cell. This also applies to list
and set
.
The bad thing: It may be quite challenging to do such assignment in an optimized way that does not involve iteration through rows. That said, this is still doable for reasonably-sized arrays. And if you really have to store millions of such arrays, a fundamental redesign may be required. E.g. restructure your code, use MongoDB or other storage instruments instead, etc.
import pandas as pd
import numpy as np
# preallocate the output dataframe
df = pd.DataFrame(
data=np.zeros((2,4), dtype=object),
columns=["Driver", "Make", "Model", "Coordinates"]
)
# element-wise assignment
df.at[0, "Coordinates"] = np.array([[1, 0, 1],[1, 2, 3], [2, 0, 2]])
df.at[1, "Coordinates"] = np.array([[0, 0, 1],[0, 2, 0]])
# other elements were omitted
Result
print(df)
Driver Make Model Coordinates
0 0 0 0 [[1, 0, 1], [1, 2, 3], [2, 0, 2]]
1 0 0 0 [[0, 0, 1], [0, 2, 0]]
print(df.at[0, "Coordinates"])
[[1 0 1]
[1 2 3]
[2 0 2]]
print(type(df.at[0, "Coordinates"]))
<class 'numpy.ndarray'>
Upvotes: 2