Reputation: 41
I've got two dataframes, one with 2016-2020 and one with 2015-2020. I would like to extract the 2015 lines from dataframe2 and insert to dataframe1.
Dateframe1 has date, hits, keyword (same as dataframe2) The merged line must be matched by keywords. So 2015 "food" from dataframe2, must be inserted before 01.01.2016 "food" in dataframe1.
Ex: Dataframe1:
*date hits keyword*
2016-01-01 10 food
2016-31-01 5 food
2017-31-01 5 food
2018-31-01 5 food
2018-31-01 5 food
2016-01-01 55 drink
2016-22-01 1 drink
2017-31-05 2 drink
2018-31-01 1 drink
So I want all lines in 2015 containing food to be inserted above 2016 food in dataframe1. And the same with drink. All drink 2015 from dataframe2 must be inserted before 2016 drink in dataframe1.
End result:
*date hits keyword*
**2015-31-01 5 food**
2016-01-01 10 food
2016-31-01 5 food
2017-31-01 5 food
2018-31-01 5 food
2018-31-01 5 food
**2015-31-01 7 food**
2016-01-01 55 drink
2016-22-01 1 drink
2017-31-05 2 drink
2018-31-01 1 drink
Upvotes: 1
Views: 54
Reputation: 160677
Three basic frame operations:
Filter Dataframe2
to only include the rows we want.
Dataframe2[grepl("^2015", Dataframe2$date),]
# date x y
# 2 2015-31-01 5 food
# 3 2015-31-01 5 food
# 4 2015-31-01 5 food
Combine row-wise using rbind
.
Dataframe1 <- rbind(Dataframe2[grepl("^2015", Dataframe2$date),], Dataframe1)
Dataframe1
# date x y
# 2 2015-31-01 5 food
# 3 2015-31-01 5 food
# 4 2015-31-01 5 food
# 1 2016-01-01 10 food
# 21 2016-31-01 5 food
# 31 2017-31-01 5 food
# 41 2018-31-01 5 food
# 5 2018-31-01 5 food
# 6 2016-01-01 55 drink
# 7 2016-22-01 1 drink
# 8 2017-31-05 2 drink
# 9 2018-31-01 1 drink
Sort the resulting data.
Dataframe1[order(Dataframe1$date),]
# date x y
# 2 2015-31-01 5 food
# 3 2015-31-01 5 food
# 4 2015-31-01 5 food
# 1 2016-01-01 10 food
# 6 2016-01-01 55 drink
# 7 2016-22-01 1 drink
# 21 2016-31-01 5 food
# 31 2017-31-01 5 food
# 8 2017-31-05 2 drink
# 41 2018-31-01 5 food
# 5 2018-31-01 5 food
# 9 2018-31-01 1 drink
I should note that these are all using string date
values that will not sort correctly: they are sorting lexicographically, which is not numeric. Realize that
20 > 3
# [1] TRUE
"20" > "3"
# [1] FALSE
To do this right, the columns would be proper Date
class columns:
# starting with a fresh `Dataframe1`
Dataframe1$date <- as.Date(Dataframe1$date, format = "%Y-%d-%m")
Dataframe2$date <- as.Date(Dataframe2$date, format = "%Y-%d-%m")
## Filter
lims <- as.Date(c("2015-01-01", "2015-31-12"), format = "%Y-%d-%m")
Dataframe2[ lims[1] <= Dataframe2$date & Dataframe2$date <= lims[2], ] # for demo
## Combine
Dataframe1 <- rbind(Dataframe2[ lims[1] <= Dataframe2$date & Dataframe2$date <= lims[2], ], Dataframe1)
## Order
Dataframe1[order(Dataframe1$date),]
Note that R will always show them in year-month-date order when in a Date
class object. If you want it displayed something else, I suggest you do that only in report generation (using format(Dataframe1$date, format="...")
, see ?strptime
for format hints).
Upvotes: 1