Remove html tags from vector when space b/w tags and text varies in r

Question

I have the following vector:

vec<-c("
			
			
				8900 E Runstack Rd 
			
			
			
			Scottsdale,  AZ  
					85251
			"                              , 
"
			
			
				330 Orange Boulevard
			
			
			
			Beverly Hills,  CA  
					90212
			"                              , 
"
			
			
				645 Newport Center Drive 
			
			
			
			Newport Beach,  CA  
					92660
			"                              , 
"
			
			
				5000 Westlake Depot Road 
			
			
			
			Palo Alto,  CA  
					94304
			"                              , 
"
			
			
				646 Lucern Road
			
			
			
			San Diego,  CA  
					92108
			"                              
)

I would like to remove all the and . I tried the following:

str_replace_all(vec, "
|	", " ")
[1] "             8900 E Runstack Rd                 Scottsdale,  AZ        85251    "         
[2] "             330 Orange Boulevard                Beverly Hills,  CA        90212    "     
[3] "             645 Newport Center Drive                 Newport Beach,  CA        92660    "
[4] "             5000 Westlake Depot Road                 Palo Alto,  CA        94304    "    
[5] "             646 Lucern Road                San Diego,  CA        92108    "

But that converted them to whitespace. I tried this:

str_replace_all(vec, "
|	", "")
[1] "8900 E Runstack Rd Scottsdale,  AZ  85251"          "330 Orange BoulevardBeverly Hills,  CA  90212"     
[3] "645 Newport Center Drive Newport Beach,  CA  92660" "5000 Westlake Depot Road Palo Alto,  CA  94304"    
[5] "646 Lucern RoadSan Diego,  CA  92108"

But note that in some instances there is no whitespace where one should be (such as index 2 330 Orange BoulevardBeverly Hills, CA 90212). The problem is because is attached to the end of some text and in other instances there's a space. How can I replace with whitespace only when it's touching a letter that comes immediately before it but replace it with no space in all other circumstances? I'm looking for the following result:

[1] "8900 E Runstack Rd Scottsdale,  AZ  85251"          "330 Orange Boulevard Beverly Hills,  CA  90212"     
[3] "645 Newport Center Drive Newport Beach,  CA  92660" "5000 Westlake Depot Road Palo Alto,  CA  94304"    
[5] "646 Lucern Road San Diego,  CA  92108"

I can achieve the above using str_squish(vec) after having run str_replace_all(vec, " | ", " ") but I would like a single line solution.

Sada93 · Accepted Answer

A single line is possible but we lose readability, and it does indeed become more complex.

gsub("^[\
|\	]+([0-9a-zA-Z ,]+)[\
|\	]+([a-zA-Z ,]+)[\
|\	]+([0-9]{5})[\
|\	]+$","\1 \2 \3",vec)

Here we take advantage of the fact that the address contains a pattern of

Street Address
City , State
5 digit Postal code

Remove html tags from vector when space b/w tags and text varies in r

Answers (2)

Related Questions