Reputation: 5193
I have the following vector:
vec<-c("\n\t\t\t\n\t\t\t\n\t\t\t\t8900 E Runstack Rd \n\t\t\t\n\t\t\t\n\t\t\t\n\t\t\tScottsdale, AZ \n\t\t\t\t\t85251\n\t\t\t" ,
"\n\t\t\t\n\t\t\t\n\t\t\t\t330 Orange Boulevard\n\t\t\t\n\t\t\t\n\t\t\t\n\t\t\tBeverly Hills, CA \n\t\t\t\t\t90212\n\t\t\t" ,
"\n\t\t\t\n\t\t\t\n\t\t\t\t645 Newport Center Drive \n\t\t\t\n\t\t\t\n\t\t\t\n\t\t\tNewport Beach, CA \n\t\t\t\t\t92660\n\t\t\t" ,
"\n\t\t\t\n\t\t\t\n\t\t\t\t5000 Westlake Depot Road \n\t\t\t\n\t\t\t\n\t\t\t\n\t\t\tPalo Alto, CA \n\t\t\t\t\t94304\n\t\t\t" ,
"\n\t\t\t\n\t\t\t\n\t\t\t\t646 Lucern Road\n\t\t\t\n\t\t\t\n\t\t\t\n\t\t\tSan Diego, CA \n\t\t\t\t\t92108\n\t\t\t"
)
I would like to remove all the \n
and \t
. I tried the following:
str_replace_all(vec, "\n|\t", " ")
[1] " 8900 E Runstack Rd Scottsdale, AZ 85251 "
[2] " 330 Orange Boulevard Beverly Hills, CA 90212 "
[3] " 645 Newport Center Drive Newport Beach, CA 92660 "
[4] " 5000 Westlake Depot Road Palo Alto, CA 94304 "
[5] " 646 Lucern Road San Diego, CA 92108 "
But that converted them to whitespace. I tried this:
str_replace_all(vec, "\n|\t", "")
[1] "8900 E Runstack Rd Scottsdale, AZ 85251" "330 Orange BoulevardBeverly Hills, CA 90212"
[3] "645 Newport Center Drive Newport Beach, CA 92660" "5000 Westlake Depot Road Palo Alto, CA 94304"
[5] "646 Lucern RoadSan Diego, CA 92108"
But note that in some instances there is no whitespace where one should be (such as index 2 330 Orange BoulevardBeverly Hills, CA 90212
). The problem is because \n
is attached to the end of some text and in other instances there's a space. How can I replace \n
with whitespace only when it's touching a letter that comes immediately before it but replace it with no space in all other circumstances? I'm looking for the following result:
[1] "8900 E Runstack Rd Scottsdale, AZ 85251" "330 Orange Boulevard Beverly Hills, CA 90212"
[3] "645 Newport Center Drive Newport Beach, CA 92660" "5000 Westlake Depot Road Palo Alto, CA 94304"
[5] "646 Lucern Road San Diego, CA 92108"
I can achieve the above using str_squish(vec)
after having run str_replace_all(vec, "\n|\t", " ")
but I would like a single line solution.
Upvotes: 1
Views: 56
Reputation: 2835
A single line is possible but we lose readability, and it does indeed become more complex.
gsub("^[\\\n|\\\t]+([0-9a-zA-Z ,]+)[\\\n|\\\t]+([a-zA-Z ,]+)[\\\n|\\\t]+([0-9]{5})[\\\n|\\\t]+$","\\1 \\2 \\3",vec)
Here we take advantage of the fact that the address contains a pattern of
Upvotes: 1
Reputation: 13319
Try: stringr::str_remove_all(vec,"[\n|\t]")
Result: Can be put back to your data.
[1] "8900 E Runstack Rd Scottsdale, AZ 85251"
[2] "330 Orange BoulevardBeverly Hills, CA 90212"
[3] "645 Newport Center Drive Newport Beach, CA 92660"
[4] "5000 Westlake Depot Road Palo Alto, CA 94304"
[5] "646 Lucern RoadSan Diego, CA 92108"
Per @Sada93's comment we lose (a) space in the second element, this is admittedly not the best approach to reintroduce the space but here it is:
gsub("BoulevardBeverly","Boulevard Beverly",vec1)#vec1 is the result of the above transformation
Other ways to reintroduce spaces: Just for illustration
vec1<-stringr::str_replace_all(vec,"[\n|\t]","")
vec2<-stringr::str_remove_all(vec1," ")
gsub("([0-9])([a-zA-Z])","\\1 \\2",vec2)
Upvotes: 0