Reputation: 262
I have a dataframe with 8000 columns and 3785 rows. Columns are company names,so,I want to calculate to spread of a company by subtrating its ask price from bid price i.e Spread of company/stock = A.ASK- A.BID. In the column name A.ASK indicates that A is name of company and .ASK indicated it is the ASK price of the company where .BID is the BID price of A . And in my dataframe all ask and bid price of company are side by side column just as illustrated in example dataframe below. Additionally I have missing data for instance if company C started trading in year 2001 it will have NA for year 2000. So, I want is not ignore the date column for calculation and additionally where the there are NA for ask and bid of company it return me NA in resultant column
Date A . ASK A .BID C. ASK C. BID
31/12/1999 NA NA NA NA
03/01/2000 NA NA NA NA
04/01/2000 82 77 NA NA
05/01/2000 82 77 NA NA
06/01/2000 82 77 NA NA
07/01/2000 82 77 NA NA
10/01/2000 82 77 NA NA
11/01/2000 82 77 NA NA
12/01/2000 NA NA NA NA
13/01/2000 NA NA NA NA
14/01/2000 NA NA 70 67
17/01/2000 NA NA 70 67
18/01/2000 97 94 70 67
19/01/2000 97 92 70 67
df2<-df1
Date A C
31/12/1999 NA NA
03/01/2000 NA NA
04/01/2000 5 NA
05/01/2000 5 NA
06/01/2000 5 NA
07/01/2000 5 NA
10/01/2000 5 NA
11/01/2000 5 NA
12/01/2000 NA NA
13/01/2000 NA NA
14/01/2000 NA 3
17/01/2000 NA 3
18/01/2000 3 3
19/01/2000 5 3
Your help is highly appreciated
Upvotes: 1
Views: 1143
Reputation: 1532
Having nicely formatted data which alternates between ask and bid price for each company makes this relatively straightforward. The following code should do what you're looking for.
# Import data
df <- read.table(text =
"Date A.ASK A.BID C.ASK C.BID
31/12/1999 NA NA NA NA
03/01/2000 NA NA NA NA
04/01/2000 82 77 NA NA
05/01/2000 82 77 NA NA
06/01/2000 82 77 NA NA
07/01/2000 82 77 NA NA
10/01/2000 82 77 NA NA
11/01/2000 82 77 NA NA
12/01/2000 NA NA NA NA
13/01/2000 NA NA NA NA
14/01/2000 NA NA 70 67
17/01/2000 NA NA 70 67
18/01/2000 97 94 70 67
19/01/2000 97 92 70 67",
header = TRUE
)
# Define a sequence which selects every second column
# ask_cols starts at column 2
# bid_cols starts at column 3
ask_cols <- (1:((ncol(df)-1)/2))*2
bid_cols <- (1:((ncol(df)-1)/2))*2+1
# Use ask_cols and bid_cols to select columns from df and calculate
df2 <- df[, ask_cols]-df[, bid_cols]
# Add the date column to df2
df2 <- cbind(df[, 1], df2)
# We will use stringr for extracting company names to define column names
library(stringr)
colnames(df2) <- c("Date", str_extract(colnames(df[, ask_cols]), "([A-Za-z]+)"))
Giving
> df2
Date A C
1 31/12/1999 NA NA
2 03/01/2000 NA NA
3 04/01/2000 5 NA
4 05/01/2000 5 NA
5 06/01/2000 5 NA
6 07/01/2000 5 NA
7 10/01/2000 5 NA
8 11/01/2000 5 NA
9 12/01/2000 NA NA
10 13/01/2000 NA NA
11 14/01/2000 NA 3
12 17/01/2000 NA 3
13 18/01/2000 3 3
14 19/01/2000 5 3
Edit: A better way to define ask_cols and bid_cols is using the seq
function
ask_cols <- seq(2, ncol(df), 2)
bid_cols <- seq(3, ncol(df), 2)
Edit 2: A better regular expression for matching company names is to use a lookahead to match any series of characters which is followed by .ASK.
colnames(df2) <- c("Date", str_extract(colnames(df[, ask_cols]), ".*(?=\\.ASK)"))
Upvotes: 2