Neo
Neo

Reputation: 3

Extract two substrings from a single string in R

I have a text field like this : -- :location: - '12.839006423950195' - '77.6580810546875' :last_location_update: 2015-08-10 16:41:46.817000000 Z

I want to extract 12.839006423950195 and 77.6580810546875 and put them into separate columns in the same data frame.

The length of these numbers vary - the only way to do it is by extracting what is nestled inside the first and second single quotation marks and third and fourth single quotation marks.

I tried using str_locate_all, str_match_all but I can't figure it our. Please help.

Thanks

Upvotes: 0

Views: 137

Answers (2)

vck
vck

Reputation: 837

Without using any library it can be done like that:

txt <- ":location: - '12.839006423950195' - '77.6580810546875' :last_location_update: 2015-08-10 16:41:46.817000000 Z"
start<-gregexpr("('.*?)[0-9.](.*?')+",txt)[[1]]+1
end<-start+attr(start,"match.length")-3
df<-data.frame(t(apply(cbind(start[1:2],end[1:2]),1,function(x) substr(txt,x[1],x[2]))))

> df
              X1               X2
1 12.839006423950195 77.6580810546875

Thanks to @thelatemail:

txt <- ":location: - '12.839006423950195' - '77.6580810546875' :last_location_update: 2015-08-10 16:41:46.817000000 Z"
df<-data.frame(t(regmatches(txt, gregexpr("(?<=')[0-9.]+(?=')",txt,perl=TRUE))[[1]]))
df

                  X1               X2
1 12.839006423950195 77.6580810546875

Upvotes: 0

akrun
akrun

Reputation: 887038

We can use str_extract_all from library(stringr). We use regex lookarounds to match one or more numbers with decimals ([0-9.]+) which is within the single quotes ((?<=') and (?=')).

library(stringr)
lst <- lapply(str_extract_all(txt, "(?<=')[0-9.]+(?=')") , as.numeric)

If we have the same length for list elements

df1 <- setNames(do.call(rbind.data.frame, lst), paste0('V', 1:2))

would get 2 column 'data.frame'

data

txt <- ":location: - '12.839006423950195' - '77.6580810546875' :last_location_update: 2015-08-10 16:41:46.817000000 Z"

Upvotes: 1

Related Questions