Reputation:
I have a string in my DB that represents notes for a user. I want to split this string up so I can separate each note into the content, user, and date.
Here is the format of the String:
"Example Note <i>Josh Test 12:53 PM on 8/14/12</i><br><br> Another example note <i>John Doe 12:00 PM on 9/15/12</i><br><br> Last Example Note <i>Joe Smoe 1:00 AM on 10/12/12</i><br><br>"
I need to break this into an array of
["Example Note", "Josh Test", "12:53 8/14/12", "Another example note", "John Doe", "12:00 PM 9/15/12", "Last Example Note", "Joe Smoe", "1:00 AM 10/12/12"]
I am still experimenting with this. Any ideas are very welcomed thank you! :)
Upvotes: 4
Views: 602
Reputation: 29281
You could use regex for a simpler approach.
s = "Example Note <i>Josh Test 12:53 PM on 8/14/12</i><br><br> Another example note <i>John Doe 12:00 PM on 9/15/12</i><br><br> Last Example Note <i>Joe Smoe 1:00 AM on 10/12/12</i><br><br>"
s.split(/\s+<i>|<\/i><br><br>\s?|(?<!on) (?=\d)/)
=> ["Example Note", "Josh Test", "12:53 PM on 8/14/12", "Another example note", "John Doe", "12:00 PM on 9/15/12", " Last Example Note", "Joe Smoe", "1:00 AM on 10/12/12"]
The datetime element is off format, but perhaps it would be acceptable to apply some formatting on them separately.
Edit: Removed unnecessary +
character.
Upvotes: 3
Reputation: 552
maybe this could be useful
require 'date'
require 'time'
text = "Example Note <i>Josh Test 12:53 PM on 8/14/12</i><br><br> Another example note <i>John Doe 12:00 PM on 9/15/12</i><br><br> Last Example Note <i>Joe Smoe 1:00 AM on 10/12/12</i><br><br>"
notes=text.split('<br><br>')
pro_notes = []
notes.each do |note_e|
notes_temp = note_e.split('<i>')
words = notes_temp[1].split(' ')
temp = words[5].gsub('</i>','')
a = temp.split('/')
full_name = words[0] + ' ' + words[1]
nn = notes_temp[0]
dt = DateTime.parse(a[2] +'/'+ a[0] +'/'+ a[1] +' '+ words[2])
pro_notes << [full_name, nn, dt]
end
Upvotes: 0
Reputation: 6346
You can use Nokogiri to parse out the required text using Xpath/CSS selectors. Just to give you a simple example with bare-bones parsing to get you started, the following maps every i
tag as a new element in an array:
require 'nokogiri'
html = Nokogiri::HTML("Example Note <i>Josh Test 12:53 PM on 8/14/12</i><br><br> Another example note <i>John Doe 12:00 PM on 9/15/12</i><br><br> Last Example Note <i>Joe Smoe 1:00 AM on 10/12/12</i><br><br>")
my_array = html.css('i').map {|text| text.content}
#=> ["Josh Test 12:53 PM on 8/14/12", "John Doe 12:00 PM on 9/15/12", "Joe Smoe :00 AM on 10/12/12"]
With the CSS selector you could just as easily do something like:
require 'nokogiri'
html = Nokogiri::HTML("<h1>My Message</h1><p>Hi today's date is: <time>Firday, May 31st</time></p>")
message_header = html.css('h1').first.content #=> "My Message"
message_body = html.css('p').first.content #=> "Hi today's date is:"
message_sent_at = html.css('p > time').first.content #=> "Friday, May 31st"
Upvotes: 1