zigloo99
zigloo99

Reputation: 134

Advice on Regex expression for a csv when quotes don't block out messages

I have a CSV that is created and doesn't quote out text comments from a column and includes new lines.

Regular expression for csv with commas and no quotes is a similiar question but he doesn't have another line or additional columns to parse through.

A line of text in the csv can look like this:

    1, 15231, 123123, 1231, word word word, YYYY-MM-DD HH:mm:ss.sss, 13453, **This would be the section with any character for users to communicate and the db stores and 
new lines to record communication**, YYYY-MM-DD HH:mm:ss.sss, User name, 12412413, 01231231, 123,12,,*ASTERIX USED*, YYYY-MM-DD HH:mm:ss.sss

Then another new line and something like about would parse through,

So far I've tried this

/(\d+?),(\d+?),(\d+?),(\d+?),(.+?),(.+?),(.+?),(.+?),(.+?),(.+?),(.+?),(.+?),(.+?),(.+?),(.+(?=,\d{4})),

But I can't seem to get past the instances if there are date references in the comments section of the csv.

Farely new to regex and the (?=) is new to me as I had to go beyond simple regex patterns.

Upvotes: 0

Views: 124

Answers (1)

Andrew Clark
Andrew Clark

Reputation: 208665

If you know the exact number of fields that there should be, then you can use the following method:

  • For each "normal" field that will not contain commas, use [^,]*
  • For the user entered field which may contain commas, use .*
  • Separate each field with a comma

For example if you have five total fields and the third is entered by the user, you would use the following regex:

([^,]*),([^,]*),(.*),([^,]*),([^,]*)

Example: http://www.rubular.com/r/E6785bWW0R

If the user entered field may contain line breaks, make sure you enable the option so that . matches line break characters (often s, or a constant like DOTALL, in some languages you can prefix your regex with (?s)). Alternatively, just replace .* with [\s\S]*, which will match anything regardless of options used.

Upvotes: 1

Related Questions