Reputation: 2940
I have data like this:
df <- data.frame(V1=c("stuff", "2nd June 2018", "otherstuff1", "baseball","", "142", "otherstuff2", "football","", "150", "4th June 2018", "otherstuff99", "hockey","", "160", "otherstuff100", "baseball", "", "190", "otherstuff5", "lacrosse", "200", "9th June 2018"), stringsAsFactors = F)
I want to insert a row by a condition, new cells "date" on bookends of any date value. There are random numbers of cells of otherstuff between the dates:
df.desired <- data.frame(V1=c("stuff","date", "2nd June 2018","date" ,"otherstuff1", "baseball","", "142", "otherstuff2", "football","", "150","date", "4th June 2018","date", "otherstuff99", "hockey","", "160", "otherstuff100", "baseball", "", "190", "otherstuff5", "lacrosse", "200", "date", "9th June 2018","date"), stringsAsFactors=F)
Upvotes: 1
Views: 236
Reputation: 6771
I'd do it like this; it looks like the dmy
function from lubridate
package succeeds to recognize all the date formats in your example, but if you have a wider variety of date strings that might not always hold:
# lubridate parses your dates in dmy function
df$date_try <- dmy(df$V1)
# the ones that are not NA must be dates
ind <- c(which(!is.na(df$date_try)))
# insert some bookends at the index locations before and after your dates
new_ind <- c(seq_along(df$date_try), ind + 0.5, ind - 0.5)
new_V1 <- c(df$V1, rep("date", length(ind) * 2))
# currently the bookends are at the end of the list,
# we must re-order them to insert at the proper locations
# create your desired output dataframe
df.new <- data.frame(V1 = new_V1[order(new_ind)])
> head(df.new)
V1
1 stuff
2 date
3 2nd June 2018
4 date
5 otherstuff1
6 baseball
Upvotes: 1
Reputation: 28339
There are three steps that you need to do:
grep
)date
rowsdate
to new data.frameCode:
# Find position of `month year`
foo <- grep(paste(month.name, "\\d+$", collapse = "|"), df$V1)
# Expand original data.frame with space for data
dfDesired <- data.frame(x = df$V1[sort(c(1:nrow(df), foo, foo))], stringsAsFactors = FALSE)
# Find position for date in expanded data.frame
bar <- foo + seq(by = 2, length.out = length(foo))
# Add date
dfDesired$x[c(bar - 1, bar + 1)] <- "date"
Notes:
grep
is done with string: paste(month.name, "\\d+$", collapse = "|")
"January \d+$|February \d+$|March \d+$|April \d+$|May \d+$|June \d+$|July \d+$|August \d+$|September \d+$|October \d+$|November \d+$|December \d+$"
We need bar
positions as rows in new data.frame are moved by: 1,3,5,+
Upvotes: 3