Reputation: 81
I have a dataframe which, for simplicity, looks like so:
import pandas as pd
!pip install dfply
from dfply import *
stocks = {'Ticker': ['AA', 'AA', 'BB', 'BB'], 'Date': ['2019-01-01', '2019-01-03', '2012-01-01', '2012-01-03']}
stocks = pd.DataFrame.from_dict(stocks))
print(stocks)
Ticker Date
AA 2019-01-01
AA 2019-01-03
BB 2012-01-01
BB 2012-01-03
One issue I have is that the dataset I have created has some stocks which became public at different dates than others, so I am trying to create a column which specifies the first date in which the ticker symbol appears. To do so, I use the following for loop:
def beginDate(stock): #returns the first day that stock is on market in dataset
return(min(
(stocks >> mask(stocks['Ticker'] == stock))['Date']
))
begin_market = []
for i in range(len(stocks)):
beginning_date = beginDate(stocks['Ticker'][i])
begin_market.append(begin_market)
I aim to have results that would look like so:
Ticker Date begin_market
AA 2019-01-01 2019-01-01
AA 2019-01-03 2019-01-01
BB 2012-01-01 2012-01-01
BB 2012-01-03 2012-01-01
The issue I am running into is that I have a large dataframe, and so the for loop is taking forever to complete. I am reading up on vectorizing the for loop in order to gain computational speed, but I am confused in how exactly it works besides with if/else type statements. Would anyone know how you might vectorized this in order to boost its speed? If so, I would greatly appreciate it!
Upvotes: 2
Views: 60
Reputation: 195408
If I understand you right, you want to groupby by "Ticker" and then select earliest date from that group:
stocks = {'Ticker': ['AA', 'AA', 'BB', 'BB'], 'Date': ['2019-01-01', '2019-01-03', '2012-01-01', '2012-01-03']}
stocks = pd.DataFrame.from_dict(stocks)
stocks['Date'] = pd.to_datetime(stocks['Date'])
stocks['First Appearance'] = stocks.groupby('Ticker')['Date'].transform('min')
print(stocks)
Prints:
Ticker Date First Appearance
0 AA 2019-01-01 2019-01-01
1 AA 2019-01-03 2019-01-01
2 BB 2012-01-01 2012-01-01
3 BB 2012-01-03 2012-01-01
Upvotes: 1