AnthonyML
AnthonyML

Reputation: 67

Date distribution histogram in Vaex

I'm trying to convert the answer from here into Vaex so I can plot a bar graph/histogram of dates from a dataframe. I tried different operations after the groupby like .sum() etc but can't manage to get it working. Is there a better way of accomplishing this in Vaex?

Upvotes: 1

Views: 50

Answers (1)

You can use agg.sum() (agg.count() if you are counting). Here is an example with fictiv sales data. Note that I use pandas only to create a csv file to be read with vaex:

import pandas as pd
import numpy as np
import vaex

np.random.seed(0)
dates = pd.date_range('20230101', periods=60) 
data = {
    'date': np.random.choice(dates, 500),
    'product_id': np.random.choice(['A', 'B', 'C'], 500),
    'quantity': np.random.randint(1, 10, 500),
    'price_per_unit': np.random.uniform(10, 50, 500)
}
pdf = pd.DataFrame(data)

csv_file_path = 'sample_sales_data.csv'
pdf.to_csv(csv_file_path, index=False)

df = vaex.from_csv(csv_file_path, parse_dates=['date'])
df['total_sales'] = df['quantity'] * df['price_per_unit']
df['year_month'] = df.date.dt.strftime('%Y-%m')
result_product = df.groupby('product_id', agg={'total_sales_sum': vaex.agg.sum(df['total_sales'])})
result_month = df.groupby('year_month', agg={'total_sales_sum': vaex.agg.sum(df['total_sales'])})

result_product_df = result_product.to_pandas_df()
result_month_df = result_month.to_pandas_df()

result_product_df, result_month_df

which gives

(  product_id  total_sales_sum
 0          B     23406.541203
 1          A     23120.765300
 2          C     24332.454628,
   year_month  total_sales_sum
 0    2023-02     33218.240290
 1    2023-01     36190.503868
 2    2023-03      1451.016974)

Upvotes: 1

Related Questions