Reputation: 679
I would like to retrieve a column from a csv file and make it an index in a dataframe. However, I realize that I might need to do another step beforehand.
The csv looks like this;
Date,Step,Order,Price
2011-01-10,Step,BUY,150
2011-01-10,Step,SELL,150
2011-01-13,Step,SELL,150
2011-01-13,Step1,BUY,400
2011-01-26,Step2,BUY,100
If I print the dataframe this is the output:
Date Step Order Price
0 0 Step BUY 150
1 1 Step SELL 150
2 2 Step SELL 150
3 3 Step1 BUY 400
4 4 Step2 BUY 100
However, the output that I would like is to tell how many buys/sells per type of Step I have on each day.
For example;
The expected dataframe and output are:
Date Num-Buy-Sell
2011-01-10 2
2011-01-13 2
2011-01-16 1
This is the code on how I'm retrieving the data frame;
num_trasanctions_day = pd.read_csv(orders_file, parse_dates=True, sep=',', dayfirst=True)
num_trasanctions_day['Transactions'] = orders.groupby(['Date', 'Order'])
num_trasanctions_day['Date'] = num_trasanctions_day.index
My first thought was to make date the index, but I guess I need to calculate how many sell/buys are there per date.
Error
KeyError: 'Order'
Thanks
Upvotes: 1
Views: 5111
Reputation: 323306
Just using value_counts
df.Date.value_counts()
Out[27]:
2011-01-13 2
2011-01-10 2
2011-01-26 1
Name: Date, dtype: int64
Edit: If you want to assign it back , you are looking for transform
also, please modify your expected output.
df['Transactions']=df.groupby('Date')['Order'].transform('count')
df
Out[122]:
Date Step Order Price Transactions
0 2011-01-10 Step BUY 150 2
1 2011-01-10 Step SELL 150 2
2 2011-01-13 Step SELL 150 2
3 2011-01-13 Step1 BUY 400 2
4 2011-01-26 Step2 BUY 100 1
Upvotes: 1