Conditional String Match R Character Vector Collapse Select Elements

Question

I have a character vector where I'd like to match a specific string and then collapse the element containing that string match only with the next element in the character vector and then allow the process to continue until the character vector ends. For example just one situation:

'"FundSponsor:Blackrock Advisors" "Category:"  "Tax-Free Income-Pennsylvania"  "Ticker:"  "MPA" "NAV Ticker:" "XMPAX"                          "Average Daily Volume (shares):" "26,000"                         "Average Daily Volume (USD):"    "$0.335M"                        "Inception Date:"  "10/30/1992" "Inception Share Price:" "$15.00"                         "Inception NAV:" "$14.18" "Tender Offer:" "No"                             "Term:" "No"'

Combining each element containing a : with only the element following it would be great BUT I've struggled with using the paste function because it just generally collapses the entire vector based on the : into one element which is not the more targeted solution I'm looking for.

Here's an example of what I'd like a portion of the revised output to look like:

"Inception Share Price:$15.00"

niko · Accepted Answer

Here is something that might help:

First split using strsplit, then bind elements that belong together

# split the string
vec <- unlist(strsplit(string, '(?=\")(?=\")', perl = TRUE))
vec <- vec[! vec %in% c(' ', '\"')]
# that's how vec looks like right now
head(vec)
# [1] "FundSponsor:Blackrock Advisors" "Category:"                      "Tax-Free Income-Pennsylvania"   "Ticker:"                        "MPA"                           
# [6] "NAV Ticker:"    
#
# now paste the elements
ind <- grepl(':.+',vec)
tmp <- vec[!ind]
vec[!ind] <- paste0(tmp[seq(1,length(tmp),2)], tmp[seq(2,length(tmp),2)])
head(vec)
# [1] "FundSponsor:Blackrock Advisors"        "Category:Tax-Free Income-Pennsylvania" "Ticker:MPA"                            "NAV Ticker:XMPAX"                     
# [5] "Average Daily Volume (shares):26,000"  "Average Daily Volume (USD):$0.335M"

with the data

string = "\"FundSponsor:Blackrock Advisors\" \"Category:\" \"Tax-Free Income-Pennsylvania\" \"Ticker:\" \"MPA\" \"NAV Ticker:\" \"XMPAX\" \"Average Daily Volume (shares):\" \"26,000\" \"Average Daily Volume (USD):\" \"$0.335M\" \"Inception Date:\" \"10/30/1992\" \"Inception Share Price:\" \"$15.00\" \"Inception NAV:\" \"$14.18\" \"Tender Offer:\" \"No\" \"Term:\" \"No\""

Explanation

The regex (?=\")(?=\") basically tells R to split the string whenever there are two \". The syntax (?!*something*) means *something* comes before/after. So the above simply reads: split the string at every position that is preceeded by a \" and that preceeds a \".
The strsplit(...) above creates elements of the form \" and ('\"Category:\" \"...' becomes the vector '\"';'Category:';'\"';' ';'...'). So by using ! vec %in% c(...) we remove those unwanted elements.

Addendum

If elements of the form "string:" followed by a " " are contained, in the above code remove the line vec <- vec[! vec %in% c(' ', '\"')] and add the lines

vec <- vec[seq(2L, length(vec), 4L)]
vec[vec == ' '] <- NA_character_

Conditional String Match R Character Vector Collapse Select Elements

Answers (2)

Related Questions