prabu
prabu

Reputation: 6171

How parse the data from TXT file with tab separator?

I am using ruby 1.8.7 , rails 2.3.8. I want to parse the data from TXT dump file separated by tab.

In this TXT dump contain some CSS property look like has some invalid data.

enter image description here

When run my code using FasterCSV gem

  FasterCSV.foreach(txt_file, :quote_char => '"',:col_sep =>'\t', :row_sep =>:auto, :headers => :first_row) do |row|
  col= row.to_s.split(/\t/)
  puts col[15]
  end

the error written in console as "Illegal quoting on line 38." Can any one suggest me how to skip the row which has invalid data and proceed data load process of remaining rows?

Upvotes: 1

Views: 1405

Answers (3)

johnf
johnf

Reputation: 445

So the problem is that TSV files don't have a quote character. The specification simply specifies that you aren't allowed to have tabs in the data.

The CSV library doesn't really support this use case. I've worked around it by specifying a quote character that I know won't appear in my data. For example

CSV.parse(txt_file, :quote_char => '☎', :col_sep => "\t" do |row|
   puts row[15] 
end

Upvotes: 1

htanata
htanata

Reputation: 36944

Here's one way to do it. We go to lower level, using shift to parse each row and then silent the MalformedCSVError exception, continuing with the next iteration. The problem with this is the loop doesn't look so nice. If anyone can improve this, you're welcome to edit the code.

FasterCSV.open(filename, :quote_char => '"', :col_sep => "\t", :headers => true) do |csv|
  row = true
  while row
    begin
      row = csv.shift
      break unless row

      # Do things with the row here...
    rescue FasterCSV::MalformedCSVError
      next
    end
  end
end

Upvotes: 3

Tudor Constantin
Tudor Constantin

Reputation: 26861

Just read the file as a regular one (not with FasterCSV), split it like you do know by \t and it should work

Upvotes: 1

Related Questions