game cmdr
game cmdr

Reputation: 3

Create a new vector with text from strings in an old vector in R

Working with a data frame in R studio. One column, PODMap, has sentences such as "At my property there is a house at 38.1234, 123.1234 and also I have a car". I want to create new columns, one for the latitude and one for the longitude.

Fvalue is the data frame. So far I have

matches <- regmatches(fvalue[,"PODMap"], regexpr("..\\.....", fvalue[,"PODMap"], perl = TRUE))

Since the only periods in the text are in longitude and latitude, this returns the first lat or long listed in each string (still working on finding a regex to grab the longitude from after the latitude but that's a different question). The problem is, for instance, if my vector is c("test 38.1111", "x", "test 38.2222") then it returns (38.1111. 38.2222) which has the right values, but the vector won't be the right length for my data frame and won't match. I need it to return a blank or a 0 or NA for each string that doesn't have the value matching the regular expression, so that it can be put into the data frame as a column. If I'm going about this entirely wrong let me know about that too.

Upvotes: 0

Views: 237

Answers (1)

Chabo
Chabo

Reputation: 3000

You can use regexecwhich returns a list of the same length so you don't loose the non-match spaces

PODMap<-c("At my property there is a house at 38.1234, 123.1234 and also I have a",
           "Test TEst TEST Tes T 12.1234, 123.4567 test Tes",
           "NO LONG HEre Here No Lat either",
           "At my property there is a house at 12.1234, 423.1234 and also I have ")

Index<-c(1:4)

fvalue<-data.frame(Index,PODMap)

matches <- regmatches(fvalue[,"PODMap"], regexec("..\\.....", fvalue[,"PODMap"], perl 
= TRUE))


> matches
[[1]]
[1] "38.1234"

[[2]]
[1] "12.1234"

[[3]]
character(0)

[[4]]
[1] "12.1234"

Using the package stringr, we can get both the long and lat.

library(stringr)
matches<-str_match_all(fvalue[,"PODMap"], ".\\d\\d\\.\\d\\d\\d\\d")

> matches
[[1]]
     [,1]      
[1,] " 38.1234"
[2,] "123.1234"

[[2]]
     [,1]      
[1,] " 12.1234"
[2,] "123.4567"

[[3]]
     [,1]

[[4]]
     [,1]      
[1,] " 12.1234"
[2,] "423.1234"

The \\d checks for any digit 1:9, so that will keep out any words, and we use str_match_all to get all the matches from the string, as regmatches will only take the first match. str_match_all will set a value to NULL instead of character(0) though, which should not be a problem.

Check out this regex demo

Upvotes: 1

Related Questions