Reputation: 113
I am trying to merge two dataframes based on Time
column. They each have differently formatted values for time. They look the following way:
Example datasets for Labourproductivity
and Depressiondframe
:
Labourproductivity <- read.csv(text="
Time,LabourProductivity
2004 Q1,96.6
Q2,96.9
Q3,96.9
Q4,97.1
2005 Q1,97.6
Q2,99.0")
Depressiondframe <- read.csv(text="
Time,DepressionCount
2004.00,875
2004.25,820
2004.50,785
2004.75,857
2005.00,844")
Data:
Labourproductivity
Time LabourProductivity
1 2004 Q1 96.6
2 Q2 96.9
3 Q3 96.9
4 Q4 97.1
5 2005 Q1 97.6
6 Q2 99.0
Depressiondframe
Time DepressionCount
1 2004.00 875
2 2004.25 820
3 2004.50 785
4 2004.75 857
5 2005.00 844
How would I be able to merge the two dataframes based on Time
?
Upvotes: 2
Views: 188
Reputation: 92300
Here's a possible approach. We will use the zoo
package which will easily convert Depressiondframe$Time
to the desired format using zoo::as.yearqtr
function, while the second one is trickier and we will use stringi::stri_extract_first_regex
and zoo::na.locf
in order to handle it. Then, just a simple merge
will finish the task
library(zoo)
library(stringi)
Depressiondframe$Time <- as.character(as.yearqtr(Depressiondframe$Time))
Labourproductivity$Time <- with(Labourproductivity,
paste(na.locf(stri_extract_first_regex(Time, "\\d{4}")),
stri_extract_first_regex(Time, "Q\\d")))
merge(Depressiondframe, Labourproductivity, by = "Time", all = TRUE)
# Time DepressionCount LabourProductivity
# 1 2004 Q1 875 96.6
# 2 2004 Q2 820 96.9
# 3 2004 Q3 785 96.9
# 4 2004 Q4 857 97.1
# 5 2005 Q1 844 97.6
# 6 2005 Q2 NA 99.0
Upvotes: 1