Reputation: 25
I have raw, messy data for time series containing around 1400 observations. Here is a snippet of what it looks like:
[new Date('2021-08-24'),1.67,1.68,0.9,null],[new Date('2021-08-23'),1.65,1.68,0.9,null],[new Date('2021-08-22'),1.62,1.68,0.9,null] ... etc
I want to pull the date and its respective value to form a tsibble in R. So, from the above values, it would be like
Date | y-variable |
---|---|
2021-08-24 | 1.67 |
2021-08-23 | 1.65 |
2021-08-22 | 1.62 |
Notice how only the first value is to be paired with its respective date - I don't need the other values. Right now, the raw data has been copied and pasted into a word document and I am unsure about how to approach data wrangling to import into R.
How could I achieve this?
Upvotes: 1
Views: 86
Reputation: 2783
How is this?
text <- "[new Date('2021-08-24'),1.67,1.68,0.9,null],[new Date('2021-08-23'),1.65,1.68,0.9,null],[new Date('2021-08-22'),1.62,1.68,0.9,null]"
df <- read.table(text = unlist(strsplit(gsub('new Date\\(|\\)', '', gsub('^.(.*).$', '\\1', text)), "].\\[")), sep = ",")
> df
V1 V2 V3 V4 V5
1 2021-08-24 1.67 1.68 0.9 null
2 2021-08-23 1.65 1.68 0.9 null
3 2021-08-22 1.62 1.68 0.9 null
Changing column names and removing the last columns is trivial from this point
Upvotes: 0
Reputation: 132706
#replace the text conncetion with a file connection if desired, the file should be a txt then
input <- readLines(textConnection("[new Date('2021-08-24'),1.67,1.68,0.9,null],[new Date('2021-08-23'),1.65,1.68,0.9,null],[new Date('2021-08-22'),1.62,1.68,0.9,null]"))
#insert line breaks
input <- gsub("],[", "\n", input, fixed = TRUE)
#remove "new Date"
input <- gsub("new Date", "", input, fixed = TRUE)
#remove parentheses and brackets
input <- gsub("[\\(\\)\\[\\]]", "", input, perl = TRUE)
#import cleaned data
DF <- read.csv(text = input, header = FALSE, quote = "'")
DF$V1 <- as.Date(DF$V1)
print(DF)
# V1 V2 V3 V4 V5
#1 2021-08-24 1.67 1.68 0.9 null
#2 2021-08-23 1.65 1.68 0.9 null
#3 2021-08-22 1.62 1.68 0.9 null
Upvotes: 1