DaveB
DaveB

Reputation: 452

Unpacking .NET collection into Pandas dataframe more quickly than Python iterator

I have built a Python wrapper around a .NET API. The wrapper is currently very slow at "unpacking" a .NET collection object into the desired pd.Series object to be returned. I would like to accelerate this part of the code by wrapping some C code to do the unpacking.

Detail

This API (specifically the OSI Pi AFSDK) is used to retrieve timeseries data from a proprietary database. The API call is achieved using the pythonnet library and returns a .NET collection called an AFValues object. The object is a collection of AFValue objects, which themselves each contain a timestamp and a value field, amongst other information. At present I "unzip" each of these objects using a Python list comprehension and combine together to form the series. Here's a much simplified version:

timestamps = [afvalue.Timestamp for afvalue in afvalues] 
# (There is actually some timezone handling etc in the above as well)
values = [afvalue.Value for afvalue in afvalues]
result = pd.Series(index = timestamps, data = values)

This list comprehensions are noticeably slow on very large collections (ie. millions of values).

Desired outcome

Ideally I would like to:

I believe the above is how Pandas and Numpy achieve their speed in large operations. Is the above the right approach, and any suggestions on how I would go about coding this?

Upvotes: 1

Views: 225

Answers (0)

Related Questions