Reputation: 452
I have built a Python wrapper around a .NET API. The wrapper is currently very slow at "unpacking" a .NET collection object into the desired pd.Series
object to be returned. I would like to accelerate this part of the code by wrapping some C code to do the unpacking.
Detail
This API (specifically the OSI Pi AFSDK) is used to retrieve timeseries data from a proprietary database. The API call is achieved using the pythonnet
library and returns a .NET collection called an AFValues object. The object is a collection of AFValue objects, which themselves each contain a timestamp
and a value
field, amongst other information. At present I "unzip" each of these objects using a Python list comprehension and combine together to form the series. Here's a much simplified version:
timestamps = [afvalue.Timestamp for afvalue in afvalues]
# (There is actually some timezone handling etc in the above as well)
values = [afvalue.Value for afvalue in afvalues]
result = pd.Series(index = timestamps, data = values)
This list comprehensions are noticeably slow on very large collections (ie. millions of values).
Desired outcome
Ideally I would like to:
pythonnet
codeAFValues
object into some precompiled code written in C (or maybe .NET? open to suggestions)Numpy
array or similar to convert to a Pandas object.I believe the above is how Pandas and Numpy achieve their speed in large operations. Is the above the right approach, and any suggestions on how I would go about coding this?
Upvotes: 1
Views: 225