Reputation: 49
I want to update specific rows in a CSV file that has dates with a data frame that I created in R.
01/04/20, Asset, Position, Price, Mark-to-Market
0, PORTFOLIO, NA, NA, 1000000
1, CASH, NA, NA, 1000000
02/04/20, Asset, Position, Price, Mark-to-Market, Position prior, Transaction, TC spent
0, PORTFOLIO, NA, NA, 999231, NA, NA, NA
1, CASH, NA, NA, 509866, NA, NA, NA
2, FUTURES, 500, 2516, 1258250, 0, 500, 629
3, VXc1, -5931, 47, -279795, 0, -5931, 140
, Total, Buys:, 1, Sells:, 1, TC spent:, 769
There are approximately 1000+ rows.
However, I am unable to read this CSV file using the following codes. Can anyone help me with this?
df4 <- read.csv("filename.csv")
Further, I have to add two columns (2 and 3) from df3 mentioned below in the rows of df4 that have dates (except the first row). Can anyone help me with this as well?
The code to get df3 is as follows. However, I don't know how to add the rows to df4 selectively in R.
df1 <- read.csv("filename1.csv")
df2 <- read.csv("filename2.csv")
df3 <- cbind(df2[,c(1)], df1[,c(3)], df2[,c(3)])
Upvotes: 0
Views: 56
Reputation: 161110
I'm not sure what you need for your second question, but to address the first:
txt <- readLines("filename.csv")
# Warning in readLines("filename.csv") :
# incomplete final line found on 'filename.csv'
multidf <- by(txt, cumsum(!grepl("\\S", txt)),
FUN = function(x) read.csv(text = x, strip.white = TRUE))
multidf
# cumsum(!grepl("\\S", txt)): 0
# X01.04.20 Asset Position Price Mark.to.Market
# 1 0 PORTFOLIO NA NA 1000000
# 2 1 CASH NA NA 1000000
# ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
# cumsum(!grepl("\\S", txt)): 1
# X02.04.20 Asset Position Price Mark.to.Market Position.prior Transaction TC.spent
# 1 0 PORTFOLIO <NA> NA 999231 NA <NA> NA
# 2 1 CASH <NA> NA 509866 NA <NA> NA
# 3 2 FUTURES 500 2516 1258250 0 500 629
# 4 3 VXc1 -5931 47 -279795 0 -5931 140
# 5 NA Total Buys: 1 Sells: 1 TC spent: 769
The multidf
object is technically a "by"
-class object, but that's really just a glorified list
:
str(multidf)
# List of 2
# $ 0:'data.frame': 2 obs. of 5 variables:
# ..$ X01.04.20 : int [1:2] 0 1
# ..$ Asset : chr [1:2] "PORTFOLIO" "CASH"
# ..$ Position : logi [1:2] NA NA
# ..$ Price : logi [1:2] NA NA
# ..$ Mark.to.Market: int [1:2] 1000000 1000000
# $ 1:'data.frame': 5 obs. of 8 variables:
# ..$ X02.04.20 : int [1:5] 0 1 2 3 NA
# ..$ Asset : chr [1:5] "PORTFOLIO" "CASH" "FUTURES" "VXc1" ...
# ..$ Position : chr [1:5] NA NA "500" "-5931" ...
# ..$ Price : int [1:5] NA NA 2516 47 1
# ..$ Mark.to.Market: chr [1:5] "999231" "509866" "1258250" "-279795" ...
# ..$ Position.prior: int [1:5] NA NA 0 0 1
# ..$ Transaction : chr [1:5] NA NA "500" "-5931" ...
# ..$ TC.spent : int [1:5] NA NA 629 140 769
From here, you can keep it as a list (can be good, see https://stackoverflow.com/a/24376207/3358227) or try to combine into a single frame (the same link has info for that, too).
Upvotes: 1