Reputation: 1685
I'm trying to parse the values separated by commas in these 4 example sources
1,'Tambaú','Praça Santo António','Tambaú','12x0',2,'I','EM',12,6,5934,50
2,'Beira Rio','Av. Beira Rio, Prox. Av Odilon Coutinho','Beira Rio','12x0',2,'I','EM',12,0,7249,0
3,'Cabo Branco','Cabo Branco, Prox. Rua Alice de Almeida','Cabo Branco','12x0',2,'I','EO',12,0,4751,0
901,'teste','teste','teste','Mini-estação de demonstração',1,'I','EO',2,1,97,50`
I am using the regex ('?.*?'?),
in Ruby. I can get the first and the last parsed like I want. However the problem with 2nd and 3rd is that there is a comma in the name (Av. Beira Rio, Prox. Av Odilon Coutinho and Cabo Branco, Prox. Rua Alice de Almeida). With my regex, these come out separated. For example I get Av. Beira Rio and Prox. Av Odilon Coutinho which is not what I want.
EDIT: I should have specified that this is not from a CSV file. It's the parameters to a function from a web page source code.
Upvotes: 2
Views: 115
Reputation: 27855
You may use CSV and set :quote_char => "'"
to handle the separator inside your fields:
#encoding: utf-8
require 'csv'
input = <<data
1,'Tambaú','Praça Santo António','Tambaú','12x0',2,'I','EM',12,6,5934,50
2,'Beira Rio','Av. Beira Rio, Prox. Av Odilon Coutinho','Beira Rio','12x0',2,'I','EM',12,0,7249,0
3,'Cabo Branco','Cabo Branco, Prox. Rua Alice de Almeida','Cabo Branco','12x0',2,'I','EO',12,0,4751,0
901,'teste','teste','teste','Mini-estação de demonstração',1,'I','EO',2,1,97,50
data
CSV.new(input, :quote_char => "'").each{|data|
p data.size
p data
}
If you don't have a String but an Array as source, you need a little adaption:
#encoding: utf-8
require 'csv'
regexArr = [
["1,'Tambaú','Praça Santo António','Tambaú','12x0',2,'I','EM',12,6,5934,50"],
["2,'Beira Rio','Av. Bei ra Rio, Prox. Av Odilon Coutinho','Beira Rio','12x0',2,'I','EM',12,0,7249,0"],
["3,'Cabo Branco','Cabo Bra nco, Prox. Rua Alice de Almeida','Cabo Branco','12x0',2,'I','EO',12,0,4751,0"],
["901,'teste','teste','test e','Mini-estação de demonstração',1,'I','EO',2,1,97,50"]
]
regexArr.each do |loc|
CSV.new(loc.first, :quote_char => "'").each do |data|
p data
end
end
As an alternative you may build a String:
input = regexArr.flatten.join("\n")
CSV.new(input, :quote_char => "'").each{|data|
p data.size
p data
}
Both methods expect an array with one-element-arrays.
Upvotes: 4
Reputation: 3960
If you want to do it with regex, you could do something like :
^(([^,]*)(,|$))*
and then get the groups
Upvotes: 2
Reputation: 21700
Good luck parsing
context-free stuff with
regex. Your data looks like CSV
.
CSV.parse("901,'teste','teste','teste','Mini-estação de demonstração',1,'I','EO',2,1,97,50")
=> [["901",
"'teste'",
"'teste'",
"'teste'",
"'Mini-estação de demonstração'",
"1",
"'I'",
"'EO'",
"2",
"1",
"97",
"50"]]
Upvotes: 1