Reputation: 63
I have some pretty big hdf Files (10e9 rows, about 100Gb) containing [X,Y,Z,Sensor_0,...,Sensor_n] values. For processing i am using vaex, which gives me nice and fast results. However, i am struggling with the following issue:
I havent found a way to make a new expression object with just every nth row of the df. In pandas i would do it like this: df_new_nth_X = df.X[::50] to only get every 50th value for the new df, which is obviously very memory consuming for my df's.
So i would like to "filter" the vaex df, or make an expression object containing only every nth value before making an array of it.
The questions seem to be very basic question, but i havent found a solution after reading the docs. i am not even sure if this is possible at all with memory maped objects...
Best regards Bastian
Upvotes: 0
Views: 1265
Reputation: 813
You can do a nice trick to achieve what you want. Consider the following code:
import vaex
# Example df that comes with vaex
df = vaex.example()
# Add a virtual index (takes no memory)
df['index'] = vaex.vrange(len(df))
# Make a filter / selection based on that index
# So getting one every 50 rows for example you can do
df[df.index % 50 == 0]['FeH'].values
Upvotes: 2