Reputation: 39
I have 1 minute OHLCV Candlestick data, and I need to aggregate it to create 15m Candlesticks. The database comes from MongoDB; this is a version in clean Python:
def get_candela(self,tf):
c = dict()
candel = dict()
candele_finale = list()
prov_c = list()
db = database("price_data", "1min_OHLC_XBTUSD")
col = database.get_collection(db,"1min_OHLC_XBTUSD")
db_candela = col.find({}, sort = [('timestamp', pymongo.ASCENDING)]).limit(20)
candele = list(db_candela)
timestamp_calc = list()
open_calc = list()
max_calc = list()
min_calc = list()
close_calc = list()
vol_calc = list()
#for _ in range(len(candele)):
for a in range(tf):
if len(candele) == 0:
break
prov_c.append(candele[a])
c.append(prov_c)
candele[:tf]=[]
for b in range(len(c)):
cndl = c[b]
for d in range(tf):
print(cndl)
cnd = cndl[d]
#print(len(cnd))
timestamp_calc.append(cnd["timestamp"])
open_calc.append(cnd["open"])
max_calc.append(cnd["high"])
min_calc.append(cnd["low"])
close_calc.append(cnd["close"])
vol_calc.append(cnd["volume"])
index_close=len(close_calc)
candel["timestamp"] = timestamp_calc[d]
candel["open"] = open_calc[0]
candel["high"] = max(max_calc)
candel["low"] = min(min_calc)
candel["close"] = close_calc[index_close-1]
candel["volume"] = sum(vol_calc)
#print(candel)
candele_finale.append(candel)
max_calc.clear()
min_calc.clear()
vol_calc.clear()
return candele_finale
This returns an array with only the last candlestick created. And this is another version in pandas:
db = database("price_data", "1min_OHLC_XBTUSD")
col = database.get_collection(db,"1min_OHLC_XBTUSD")
db_candela = col.find({}, sort = [('timestamp', pymongo.ASCENDING)]).limit(20)
prov_c = list()
for item in db_candela:
cc={"timestamp":item["timestamp"],"open":item["open"],"high":item["high"],"low":item["low"],"close":item["close"],"volume":item["volume"]}
prov_c.append(cc)
print(prov_c)
data = pandas.DataFrame([prov_c], index=[pandas.to_datetime(cc["timestamp"])])
#print(data)
df = data.resample('5T').agg({'timestamp':'first','open':'first','high':'max', 'low':'min','close' : 'last','volume': 'sum'})
#print(data.mean())
#with pandas.option_context('display.max_rows', None, 'display.max_columns',None): # more options can be specified also
pprint(df)
This returns a dataframe with weird/random values.
Upvotes: 0
Views: 1252
Reputation: 31
I had the same question today, and I answered it. Basically there is a pandas function called resample that does all of the work for you.
Here is my code:
import json
import pandas as pd
#load the raw data and clean it up
data_json = open('./testapiresults.json') #load json object
data_dict = json.load(data_json) #convert to a dict object
df = pd.DataFrame.from_dict(data_dict) #convert to panel data's dataframe object
df['datetime'] = pd.to_datetime(df['datetime'],unit='ms') #convert from unix time (ms from epoch) to human time
df = df.set_index(pd.DatetimeIndex(df['datetime'])) #set time series as index format for resample function
#resample to aggregate to a different frequency of data
candle_summary = pd.DataFrame()
candle_summary['open'] = df.open.resample('15Min').first()
candle_summary['high'] = df.high.resample('15Min').max()
candle_summary['low'] = df.low.resample('15Min').min()
candle_summary['close'] = df.close.resample('15Min').last()
candle_summary.head()
I had to export to csv and recalculate it in excel to doublecheck that it was calculating correctly, but this works. I have not figured out the pandas.DataFrame.resample('30min').ohlc() function, but that looks like it could make some really elegant code if I could figure out how to make it work without tons of errors.
Upvotes: 1