Reputation: 1130
I'm trying to parse a pdf file and I would like to get an input without word break at the end of the line, ex :
text.pdf
"hello guys I ne-
ed help"
How to remove the "-" and the line break in order to stick the both part of "need" together
This is my actual code :
reader = PDF::Reader.new(‘text.pdf’)
reader.pages.each do |page|
page.text.each_line do |line|
words = line.split(” “) # => ["hello"], ["guys"], ["I"], ["ne-"], ["ed"], ["help"]
words.each do |word|
puts word
end
end
Upvotes: 1
Views: 815
Reputation: 52357
You can use String#gsub
:
a = "hello guys I ne-
ed help"
#=> "hello guys I ne-\n" + "ed help"
a.gsub(/-|\n/, '-' => '', "\n" => '')
#=> "hello guys I need help"
With your code:
reader = PDF::Reader.new(‘text.pdf’)
reader.pages.each do |page|
page.text.each_line { |line| line.gsub(/-|\n/, '-' => '', "\n" => '')}
end
Or, if dash and new line element are always together substitute them together:
a.gsub(/-\n/, '')
#=> "hello guys I need help"
Upvotes: 1