Reputation: 125
I have a dataset with many financial signal values for different stocks at different times.For example
StockName Date Signal1 Signal2
----------------------------------
Stock1 1/1/20 a b
Stock1 1/2/20 c d
.
.
.
Stock2 1/1/20 e f
Stock2 1/2/20 g h
.
.
.
I would like to build a time series table look like below and cluster stocks based on both signal1 and signal2 (2 variables)
StockName 1/1/20 1/2/20 ........ Cluster#
----------------------------------------------------
Stock1 [a,b] [c,d] 0
Stock2 [e,f] [g,h] 1
Stock3 ...... ..... 0
.
.
.
1)Are there any ways to do this? (Clustering stocks based on multiple variables for the time series data). I tried to search online but they are all about clustering time series based on one variable.
2)Also, are there any ways to cluster different stocks at different times as well? (So maybe Stock1 at time1 is in the same cluster with Stock2 at time3)
Upvotes: 4
Views: 5519
Reputation: 125
Good material to read (Title: Time Series Clustering and Dimensionality Reduction)
https://towardsdatascience.com/time-series-clustering-and-dimensionality-reduction-5b3b4e84f6a3
Upvotes: 1
Reputation: 20342
I am revising my answer here, based on the new information that you last posted.
from utils import *
import time
import numpy as np
from mxnet import nd, autograd, gluon
from mxnet.gluon import nn, rnn
import mxnet as mx
import datetime
import seaborn as sns
import matplotlib.pyplot as plt
# %matplotlib inline
from sklearn.decomposition import PCA
import math
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler
import xgboost as xgb
from sklearn.metrics import accuracy_score
import warnings
warnings.filterwarnings("ignore")
context = mx.cpu(); model_ctx=mx.cpu()
mx.random.seed(1719)
# Note: The purpose of this section (3. The Data) is to show the data preprocessing and to give rationale for using different sources of data, hence I will only use a subset of the full data (that is used for training).
def parser(x):
return datetime.datetime.strptime(x,'%Y-%m-%d')
# dataset_ex_df = pd.read_csv('data/panel_data_close.csv', header=0, parse_dates=[0], date_parser=parser)
import yfinance as yf
# Get the data for the stock AAPL
start = '2018-01-01'
end = '2020-04-22'
data = yf.download('GS', start, end)
data = data.reset_index()
data
data.dtypes
# re-name field from 'Adj Close' to 'Adj_Close'
data = data.rename(columns={"Adj Close": "Adj_Close"})
data
num_training_days = int(data.shape[0]*.7)
print('Number of training days: {}. Number of test days: {}.'.format(num_training_days, data.shape[0]-num_training_days))
# TECHNICAL INDICATORS
#def get_technical_indicators(dataset):
# Create 7 and 21 days Moving Average
data['ma7'] = data['Adj_Close'].rolling(window=7).mean()
data['ma21'] = data['Adj_Close'].rolling(window=21).mean()
# Create exponential weighted moving average
data['26ema'] = data['Adj_Close'].ewm(span=26).mean()
data['12ema'] = data['Adj_Close'].ewm(span=12).mean()
data['MACD'] = (data['12ema']-data['26ema'])
# Create Bollinger Bands
data['20sd'] = data['Adj_Close'].rolling(window=20).std()
data['upper_band'] = data['ma21'] + (data['20sd']*2)
data['lower_band'] = data['ma21'] - (data['20sd']*2)
# Create Exponential moving average
data['ema'] = data['Adj_Close'].ewm(com=0.5).mean()
# Create Momentum
data['momentum'] = data['Adj_Close']-1
dataset_TI_df = data
dataset = data
def plot_technical_indicators(dataset, last_days):
plt.figure(figsize=(16, 10), dpi=100)
shape_0 = dataset.shape[0]
xmacd_ = shape_0-last_days
dataset = dataset.iloc[-last_days:, :]
x_ = range(3, dataset.shape[0])
x_ =list(dataset.index)
# Plot first subplot
plt.subplot(2, 1, 1)
plt.plot(dataset['ma7'],label='MA 7', color='g',linestyle='--')
plt.plot(dataset['Adj_Close'],label='Closing Price', color='b')
plt.plot(dataset['ma21'],label='MA 21', color='r',linestyle='--')
plt.plot(dataset['upper_band'],label='Upper Band', color='c')
plt.plot(dataset['lower_band'],label='Lower Band', color='c')
plt.fill_between(x_, dataset['lower_band'], dataset['upper_band'], alpha=0.35)
plt.title('Technical indicators for Goldman Sachs - last {} days.'.format(last_days))
plt.ylabel('USD')
plt.legend()
# Plot second subplot
plt.subplot(2, 1, 2)
plt.title('MACD')
plt.plot(dataset['MACD'],label='MACD', linestyle='-.')
plt.hlines(15, xmacd_, shape_0, colors='g', linestyles='--')
plt.hlines(-15, xmacd_, shape_0, colors='g', linestyles='--')
# plt.plot(dataset['log_momentum'],label='Momentum', color='b',linestyle='-')
plt.legend()
plt.show()
plot_technical_indicators(dataset_TI_df, 400)
This will give you some signals to work with. Of course, these features can be anything you want. I'm sure you know this is technical analysis, and not fundamental analysis. Now, you can do your clustering, and whatever else you want, at this point.
Here is a good link for clustering.
https://www.pythonforfinance.net/2018/02/08/stock-clusters-using-k-means-algorithm-in-python/
Upvotes: 2