sds
sds

Reputation: 60072

Parsing strings into data frames

I have a bunch of strings which look like this:

 [3] "  3. Wiki: Los Angeles 3:58pm; score:1.959502"        
 [4] "  4. Wiki: Boston 6:58pm; score:1.959502"             
 [5] "  5. Disambiguation: 'Boon; score:1.934644"            
 [6] "  6. Wiki: The Note (album)\"; score:1.786931"          

I parse them into a data frame like this:

read.csv(text=sub("^  [0-9]*\\. (Wiki|Disambiguation): (.*); score:([0-9\\.]*)$","\"\\2\",\\3",ll),
         header=FALSE,stringsAsFactors=FALSE)

the trouble is that the \\2 text which I enclose in quotes may contains quotes (double and single) itself.

How do I deal with this?

Upvotes: 1

Views: 160

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 270268

Just remove the double quotes:

ll <-  gsub('"', '', ll)

NOTE: Changed answer after poster gave an example of how it goes wrong.

Upvotes: 1

Related Questions