Reputation: 786
I have data that's formatted this way, as a single string:
"1. Enloe Medical Center - 2,000
2. CSU Chico - 1,805
3. Walmart Distribution Center - 1,350
4. Pacific Coast Producers (Agribusiness) - 1,200
5. Marysville School District - 1,000
6. Feather River Hospital - 865
7. Sunsweet Growers (Agriculture) - 600
8. YRC (Freight Services) - 500
9. Sierra Pacific Industries (Lumber Products) - 500
10. Colusa Casino Resort - 500"
In a Ruby app, I'd like to create two arrays: one of the substrings between each numbered list marker and the dash, and one of the substrings containing the numbers between the dash and the newlines (as integers), like so:
labels = ["Enloe Medical Center","CSU Chico","Walmart Distribution Center","Pacific Coast Producers (Agribusiness)","Marysville School District","Feather River Hospital","Sunsweet Growers (Agriculture)","YRC (Freight Services)","Sierra Pacific Industries (Lumber Products)","Colusa Casino Resort"]
numbers = [2000, 1805, 1350, 1200, 1000, 865, 600, 500, 500, 500]
I'm not so great with my regexes; I know how to do substitutions and matching, but I'm not sure where to start with this. Can anyone help?
Upvotes: 0
Views: 231
Reputation: 168081
labels, numbers = string.scan(/^\s*\d+\.\s+(.+)\s+-\s+([\d,]+)\s*$/).transpose
numbers.map!{|s| s.gsub(",", "").to_i}
Upvotes: 3
Reputation: 1420
One thing that makes it easy:
/pat/m - Treat a newline as a character matched by .
Other thing is grouping(example in 2nd part).
You write regexp for 1 line, and it fits whole string:
r1 = /\d+\,\d+\s*$/m
str.scan r1
["2,000 ", "1,805 ", "1,350 ", "1,200 ", "1,000 "]
$
matches end of line
\d
number
+
how many times-> one or more
\s
space(0 or more times)
ps. since you know how to substitute I haven't changed it to numbers
r2 = /\d+\.\s*([\w\s]+)\s*\-/m
str.scan(r2).flatten
\d+
- matches number 1 or more times
\.
- matches .
- you must escape it because .
matches any character
s*
- spaces 0 or more
[\w\s]+
- any word character or space, 1 or more times
()
- you are grouping, and it's easy way to say I want this surrounded by this, more here: regexp ruby - capturing
Upvotes: 1
Reputation: 89547
You can do this:
rawlines = <<EOF
1. Enloe Medical Center - 2,000
2. CSU Chico - 1,805
3. Walmart Distribution Center - 1,350
4. Pacific Coast Producers (Agribusiness) - 1,200
5. Marysville School District - 1,000
6. Feather River Hospital - 865
7. Sunsweet Growers (Agriculture) - 600
8. YRC (Freight Services) - 500
9. Sierra Pacific Industries (Lumber Products) - 500
10. Colusa Casino Resort - 500
EOF
labels = []
numbers = []
rawlines.scan(/^[0-9]+\. ([^-]+) - ([1-9][0-9]{0,2}(?>,[0-9]{3})*)/) do |label, number|
labels << label
numbers << number.gsub(",", "")
end
puts labels
puts numbers
Note that this part of the pattern ([1-9][0-9]{0,2}(?>,[0-9]{3})*)
can be replaced by ([0-9,]+)
Upvotes: 0
Reputation: 9752
str = %{1. Enloe Medical Center - 2,000
2. CSU Chico - 1,805
3. Walmart Distribution Center - 1,350
4. Pacific Coast Producers (Agribusiness) - 1,200
5. Marysville School District - 1,000
6. Feather River Hospital - 865
7. Sunsweet Growers (Agriculture) - 600
8. YRC (Freight Services) - 500
9. Sierra Pacific Industries (Lumber Products) - 500
10. Colusa Casino Resort - 500}
numbers = str.scan(/-\ (\d.*)$/).flatten.map{|s| s.gsub(",", "").to_i} # => [2000, 1805, 1350, 1200, 1000, 865, 600, 500, 500, 500] # !> assigned but unused variable - numbers
labels = str.scan(/\d+\.\s(.*)\s-/).flatten # => ["Enloe Medical Center", "CSU Chico", "Walmart Distribution Center", "Pacific Coast Producers (Agribusiness)", "Marysville School District", "Feather River Hospital", "Sunsweet Growers (Agriculture)", "YRC (Freight Services)", "Sierra Pacific Industries (Lumber Products)", "Colusa Casino Resort"] # !> assigned but unused variable - labels
Upvotes: 0
Reputation: 5290
s = "1. Enloe Medical Center - 2,000
2. CSU Chico - 1,805
3. Walmart Distribution Center - 1,350
4. Pacific Coast Producers (Agribusiness) - 1,200
5. Marysville School District - 1,000
6. Feather River Hospital - 865
7. Sunsweet Growers (Agriculture) - 600
8. YRC (Freight Services) - 500
9. Sierra Pacific Industries (Lumber Products) - 500
10. Colusa Casino Resort - 500"
arr1 = s.each_line.map { | x |
x.match(/- (.*)/)[ 1 ].gsub(/[^0-9]*/,'')
}
arr2 = s.each_line.map { | x |
x.match(/\d. (.*) - (.*)/)[ 1 ]
}
puts arr1
puts arr2
Upvotes: 0