How do I convert a Python DataFrame into a NumPy array

Question

Below is a snippet that converts data into a NumPy array. It is then converted to a Pandas DataFrame where I intend to process it. I'm attempting to convert it back to a NumPy array. I'm failing at this. Badly.

import pandas as pd
import numpy as np
from pprint import pprint

data = [
    ('2020-11-01 00:00:00', 1.0),
    ('2020-11-02 00:00:00', 2.0)
]
coordinatesType = [('timestamp', 'datetime64[s]'), ('value', '


This yields crazytown:
array([[('2020-11-01T00:00:00', 1.6041888e+18),
        ('1970-01-01T00:00:01', 1.0000000e+00)],
       [('2020-11-02T00:00:00', 1.6042752e+18),
        ('1970-01-01T00:00:02', 2.0000000e+00)]],
      dtype=[('timestamp', '

I realize a DataFrame is really a fancy NumPy array under the hood, but I'm passing back to a function that accepts a simple NumPy array. Clearly I'm not handling dtypes correctly and/or I don't understand the data structure inside my DataFrame. Below is what the function I'm calling expects:
[('2020-11-01T00:00:00', 1.000   ),
 ('2020-11-02T00:00:00', 2.000  )],
 dtype=[('timestamp', '

I'm really lost on how to do this. Or what I should be doing instead.
Help!

As @hpaul suggested, I tried the following:
# ...
df = df.set_index('timestamp')

# do some pandas processing, then convert back to a numpy array

mutatedNpArray = df.to_records(coordinatesType)
# ...

All good!

Cain&#227; Max Couto da Silva · Accepted Answer

Besides the to_records approach mentioned in comments, you can do:

df.apply(tuple, axis=1).to_numpy(coordinatesType)

Output:

array([('2020-11-01T00:00:00', 1.), ('2020-11-02T00:00:00', 2.)],
      dtype=[('timestamp', '



Considerations:
I believe the issue here is related to the difference between the original array and the dataframe.
The shape your original numpy array is (2,), where each value is a tuple. When creating the dataframe, both df.shape and df.to_numpy() shapes are (2, 2) so that the dtype constructor does not work as expected. When converting rows to tuples into a pd.Series, you get the original shape of (2,).

How do I convert a Python DataFrame into a NumPy array

Answers (1)

Related Questions