Alteredorange
Alteredorange

Reputation: 636

Adding Column To Each Quantmod Symbol

Learning R, not sure how to solve this.

library(quantmod)
library(xts)

# get market data
Nasdaq100_Symbols <- c("AAPL", "AAL")
getSymbols(Nasdaq100_Symbols)

# merge them together
nasdaq100 <- data.frame(as.xts(merge(AAPL, AAL)))
#tail(nasdaq100[,1:12],2)

#make percent difference column
nasdaq100$PD <- (((nasdaq100$AAPL.High - nasdaq100$AAPL.Open)/nasdaq100$AAPL.Open) * 100)

I'm trying to add a percent difference column, but the above code will only work for the AAPL symbol (or whichever symbol you use) and not create a PD column for each symbol.

Do you have to add that column somehow before merging with xts, or can I tell R to create it for each symbol in the new merged frame?

Edit: I am doing data training, so I need all symbols to be headers, like:

           AAPL.Ope AAPL.High AAPL.Volume AAL.Open AAL.High

1/3/2007    86.29   86.58      309579900    53.89   56.92
1/4/2007    84.05   85.95      211815100    56.3    59.15
1/5/2007    85.77   86.2       208685400    58.83   59.15

Upvotes: 1

Views: 775

Answers (3)

hvollmeier
hvollmeier

Reputation: 2986

I would like to add a solution using the quite new and interesting package tidyquant which is great for tasks like this. You can use all the stuff from tidyverse and on top it leverages the quantitative power of xts, quantmod and TTR ! Have a look at the vignette with many examples.

library(tidyquant)
c("AAPL", "AAL") %>%
  tq_get(get = "stock.prices") %>%
  group_by(symbol.x) %>%
  tq_mutate(ohlc_fun = OHLCV, mutate_fun = OpCl,col_rename='diff') %>% 
  select(- c(low,volume)) # I deselect low and volume to show the added colum ‘diff'

Source: local data frame [5,110 x 7]
Groups: symbol.x [2]

   symbol.x       date  open  high close adjusted         diff
      <chr>     <date> <dbl> <dbl> <dbl>    <dbl>        <dbl>
1      AAPL 2007-01-03 86.29 86.58 83.80 10.85709 0.0033607603
2      AAPL 2007-01-04 84.05 85.95 85.66 11.09807 0.0226055559
3      AAPL 2007-01-05 85.77 86.20 85.05 11.01904 0.0050133730
4      AAPL 2007-01-08 85.96 86.53 85.47 11.07345 0.0066309913
5      AAPL 2007-01-09 86.45 92.98 92.57 11.99333 0.0755349424

UPDATE: question was asked to add symbols into column names

Assuming you saved the above data frame into variable stocks:

lapply(unique(stocks$symbol.x), function(x) stocks[stocks$symbol.x == x,]) %>% 
lapply( function(x) {
  names(x) <- paste0(x$symbol.x[1],'.',colnames(x))
  x
}) 
[[1]]
Source: local data frame [2,555 x 7]
Groups: symbol.x [1]

   AAPL.symbol.x  AAPL.date AAPL.open AAPL.high AAPL.close
           <chr>     <date>     <dbl>     <dbl>      <dbl>
1           AAPL 2007-01-03     86.29     86.58      83.80
2           AAPL 2007-01-04     84.05     85.95      85.66
3           AAPL 2007-01-05     85.77     86.20      85.05
4           AAPL 2007-01-08     85.96     86.53      85.47
5           AAPL 2007-01-09     86.45     92.98      92.57
6           AAPL 2007-01-10     94.75     97.80      97.00
7           AAPL 2007-01-11     95.94     96.78      95.80
8           AAPL 2007-01-12     94.59     95.06      94.62
9           AAPL 2007-01-16     95.68     97.25      97.10
10          AAPL 2007-01-17     97.56     97.60      94.95
# ... with 2,545 more rows, and 2 more variables:
#   AAPL.adjusted <dbl>, AAPL.diff <dbl>

[[2]]
Source: local data frame [2,555 x 7]
Groups: symbol.x [1]

   AAL.symbol.x   AAL.date AAL.open AAL.high AAL.close
          <chr>     <date>    <dbl>    <dbl>     <dbl>
1           AAL 2007-01-03    53.89    56.92     56.30
2           AAL 2007-01-04    56.30    59.15     58.84
3           AAL 2007-01-05    58.83    59.15     58.29
4           AAL 2007-01-08    57.30    60.48     57.93
5           AAL 2007-01-09    59.44    60.20     57.90
6           AAL 2007-01-10    60.03    60.04     58.93
7           AAL 2007-01-11    59.18    61.20     61.20
8           AAL 2007-01-12    61.20    62.50     60.81
9           AAL 2007-01-16    60.81    62.10     61.96
10          AAL 2007-01-17    60.96    61.89     58.65
# ... with 2,545 more rows, and 2 more variables:
#   AAL.adjusted <dbl>, AAL.diff <dbl>

Upvotes: 2

FXQuantTrader
FXQuantTrader

Reputation: 6891

In my experience, it usually makes much more sense to keep your financial data as xts objects, for future manipulation with other technical indicators etc, unless you plan to run a prediction model in say caret in which case converting to a data.frame may make sense.

Consider keeping your symbols of data as elements of a container, such as

update_sym_md <- function(sym, env = .GlobalEnv) {
    x <- get(sym, env)
    pd <- setNames((Hi(x) - Op(x)) / Op(x), "PD")
    merge(x, pd)
}

# Adjust env for location of xts symbol data
l.syms <- lapply(Nasdaq100_Symbols, update_sym_md, env = .GlobalEnv)

lapply(l.syms, head)
# [[1]]
# AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted          PD
# 2007-01-03     86.29     86.58    81.90      83.80   309579900      10.85709 0.003360760
# 2007-01-04     84.05     85.95    83.82      85.66   211815100      11.09807 0.022605556
# 2007-01-05     85.77     86.20    84.40      85.05   208685400      11.01904 0.005013373
# 2007-01-08     85.96     86.53    85.28      85.47   199276700      11.07345 0.006630991
# 2007-01-09     86.45     92.98    85.15      92.57   837324600      11.99333 0.075534942
# 2007-01-10     94.75     97.80    93.45      97.00   738220000      12.56728 0.032190006
# 
# [[2]]
# AAL.Open AAL.High AAL.Low AAL.Close AAL.Volume AAL.Adjusted           PD
# 2007-01-03    53.89    56.92   53.89     56.30    2955600     54.80361 0.0562256273
# 2007-01-04    56.30    59.15   53.65     58.84    2614500     57.27610 0.0506217238
# 2007-01-05    58.83    59.15   57.90     58.29    1656300     56.74072 0.0054394015
# 2007-01-08    57.30    60.48   57.04     57.93    2163200     56.39028 0.0554974006
# 2007-01-09    59.44    60.20   57.56     57.90    2098600     56.36108 0.0127860366
# 2007-01-10    60.03    60.04   57.34     58.93    3892200     57.36371 0.0001666167

Also, if you do want to compare price returns/raw prices across symbols in one xts object, rather than in a data.frame, you might find the qmao package useful.

For example:

install.packages("qmao", repos="http://R-Forge.R-project.org", type = "source")
library(qmao)

pf <- makePriceFrame(Nasdaq100_Symbols)
head(pf, 3)
#               AAPL      AAL
# 2007-01-03 10.85709 54.80361
# 2007-01-04 11.09807 57.27610
# 2007-01-05 11.01904 56.74072
rf <- makeReturnFrame(Nasdaq100_Symbols)
head(rf)

#                  AAPL           AAL
# 2007-01-03           NA            NA
# 2007-01-04  0.021952895  0.0441273684
# 2007-01-05 -0.007146715 -0.0093913155
# 2007-01-08  0.004926208 -0.0061951917
# 2007-01-09  0.079799692 -0.0005179716
# 2007-01-10  0.046745798  0.0176329011

Update in response to comment from OP:

To join all the data into one row, try this:

(Aside: if you're going to use non linear prediction models on this data.frame, make sure you consider scaling your data points across securities in each row first.)

x.cbind <- do.call(cbind, l.syms)
head(x.cbind)
# AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted          PD AAL.Open AAL.High AAL.Low AAL.Close AAL.Volume AAL.Adjusted         PD.1
# 2007-01-03     86.29     86.58    81.90      83.80   309579900      10.85709 0.003360760    53.89    56.92   53.89     56.30    2955600     54.80361 0.0562256273
# 2007-01-04     84.05     85.95    83.82      85.66   211815100      11.09807 0.022605556    56.30    59.15   53.65     58.84    2614500     57.27610 0.0506217238
# 2007-01-05     85.77     86.20    84.40      85.05   208685400      11.01904 0.005013373    58.83    59.15   57.90     58.29    1656300     56.74072 0.0054394015
# 2007-01-08     85.96     86.53    85.28      85.47   199276700      11.07345 0.006630991    57.30    60.48   57.04     57.93    2163200     56.39028 0.0554974006
# 2007-01-09     86.45     92.98    85.15      92.57   837324600      11.99333 0.075534942    59.44    60.20   57.56     57.90    2098600     56.36108 0.0127860366
# 2007-01-10     94.75     97.80    93.45      97.00   738220000      12.56728 0.032190006    60.03    60.04   57.34     58.93    3892200     57.36371 0.0001666167

df.cbind <- data.frame("time" = index(x.cbind), coredata(x.cbind))
head(df.cbind)
# time AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted          PD AAL.Open AAL.High AAL.Low AAL.Close AAL.Volume AAL.Adjusted         PD.1
# 1 2007-01-03     86.29     86.58    81.90      83.80   309579900      10.85709 0.003360760    53.89    56.92   53.89     56.30    2955600     54.80361 0.0562256273
# 2 2007-01-04     84.05     85.95    83.82      85.66   211815100      11.09807 0.022605556    56.30    59.15   53.65     58.84    2614500     57.27610 0.0506217238
# 3 2007-01-05     85.77     86.20    84.40      85.05   208685400      11.01904 0.005013373    58.83    59.15   57.90     58.29    1656300     56.74072 0.0054394015
# 4 2007-01-08     85.96     86.53    85.28      85.47   199276700      11.07345 0.006630991    57.30    60.48   57.04     57.93    2163200     56.39028 0.0554974006
# 5 2007-01-09     86.45     92.98    85.15      92.57   837324600      11.99333 0.075534942    59.44    60.20   57.56     57.90    2098600     56.36108 0.0127860366
# 6 2007-01-10     94.75     97.80    93.45      97.00   738220000      12.56728 0.032190006    60.03    60.04   57.34     58.93    3892200     57.36371 0.0001666167

To better understand how qmao functions work, why not look at the examples in the documentation and go from there? ?makeReturnFrame Look at the source code to really understand what's going on (and become a better R programmer at the same time by learning good coding styles)

Upvotes: 4

David
David

Reputation: 10232

What I usually do when I use quantmods getSymbols function, is to write my own small wrapper-function, something along this:

library(quantmod)
# 1. write the wrapper function
my_wrapper <- function(tickers, from, to) {
  result_list <- lapply(tickers, function(ticker) {
    tmp <- getSymbols(ticker, from = from, to = to, auto.assign = F)
    tmp_df <- data.frame(date = index(tmp), 
                         ticker = ticker,
                         open = as.numeric(Op(tmp)), 
                         high = as.numeric(Hi(tmp)), 
                         close = as.numeric(Cl(tmp)),
                         adj = as.numeric(Ad(tmp)))
  })

  result_df <- do.call(rbind, result_list)
  return(result_df)
}

# 2. download and inspect the data
my_df <- my_wrapper(c("AAPL", "AAL"), from = "2010-01-01", to = "2016-12-31")
summary(my_df)
# date             ticker          open             high            close             adj         
# Min.   :2010-01-04   AAPL:1762   Min.   :  3.99   Min.   :  4.06   Min.   :  4.00   Min.   :  3.894  
# 1st Qu.:2011-09-30   AAL :1762   1st Qu.: 17.16   1st Qu.: 17.46   1st Qu.: 17.23   1st Qu.: 16.770  
# Median :2013-07-04               Median : 72.94   Median : 73.45   Median : 73.02   Median : 41.920  
# Mean   :2013-07-02               Mean   :168.48   Mean   :170.10   Mean   :168.40   Mean   : 49.208  
# 3rd Qu.:2015-04-06               3rd Qu.:318.11   3rd Qu.:320.39   3rd Qu.:318.23   3rd Qu.: 72.969  
# Max.   :2016-12-30               Max.   :702.41   Max.   :705.07   Max.   :702.10   Max.   :127.966  

Then, to calculate the differences I would suggest using dplyr or data.table or some other data-frame manipulation package. Here I use dplyr.

# 3. Calculate the difference using dplyr
library(dplyr)

my_rets <- my_df %>% group_by(ticker) %>% mutate(pd = (high - open) / open)

my_rets
# Source: local data frame [3,524 x 7]
# Groups: ticker [2]
# 
#         date ticker   open   high  close      adj           pd
#        <date> <fctr>  <dbl>  <dbl>  <dbl>    <dbl>        <dbl>
# 1  2010-01-04   AAPL 213.43 214.50 214.01 27.72704 0.0050133440
# 2  2010-01-05   AAPL 214.60 215.59 214.38 27.77498 0.0046132153
# 3  2010-01-06   AAPL 214.38 215.23 210.97 27.33318 0.0039649549
# 4  2010-01-07   AAPL 211.75 212.00 210.58 27.28265 0.0011806659
# 5  2010-01-08   AAPL 210.30 212.00 211.98 27.46403 0.0080837473
# 6  2010-01-11   AAPL 212.80 213.00 210.11 27.22176 0.0009398731
# 7  2010-01-12   AAPL 209.19 209.77 207.72 26.91211 0.0027725991
# 8  2010-01-13   AAPL 207.87 210.93 210.65 27.29172 0.0147206905
# 9  2010-01-14   AAPL 210.11 210.46 209.43 27.13366 0.0016657655
# 10 2010-01-15   AAPL 210.93 211.60 205.93 26.68020 0.0031764188
# # ... with 3,514 more rows

P.s. you find a good dplyr introduction here: https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html

Upvotes: 1

Related Questions