Reputation: 1633
I have following Text input.
Group 1,Good,LEADS,"Leads Description 1
Leads Description 2","Note 1
Note 2",1,100,210,10,Amt,15%
Group 2,Good, SWITCHING, Switching, Description 1, "Switching Note 1
Switching Note 2",4,130,210,15,Amt,15%
Group 1,Service,LICENCE,Licence Description 1,Licence Note 1,2,200,400,5,Pct,15%
The Description and Note can be in the same line, or can have multi lines value. These are total 3 lines. When description and note is multi line, the Text is in Double Quotes ""
so for line without multi line description or note a simple explode is working but for either of them in multi line. i am using following statement to parse it.
preg_split("/\n|\r\n?/", $text);
this statement works for lines, it only needs to take care of one condition as to consider the text between double quotes as one line.
Edit: the above Text is assigned to $text
Upvotes: 1
Views: 86
Reputation: 47894
You could use (*SKIP)(*FAIL)
to consume and ignore double quoted substrings, then only split on the newlines that are not consumed earlier. I'll chase the newline escape sequence (\R
) with \s*
to effectively left trim the lines.
Code: (Demo)
$text = <<<TEXT
Group 1,Good,LEADS,"Leads Description 1
Leads Description 2","Note 1
Note 2",1,100,210,10,Amt,15%
Group 2,Good, SWITCHING, Switching, Description 1, "Switching Note 1
Switching Note 2",4,130,210,15,Amt,15%
Group 1,Service,LICENCE,Licence Description 1,Licence Note 1,2,200,400,5,Pct,15%
TEXT;
var_export(preg_split('~"[^"]*"(*SKIP)(*FAIL)|\R\s*~', $text));
Output:
array (
0 => 'Group 1,Good,LEADS,"Leads Description 1
Leads Description 2","Note 1
Note 2",1,100,210,10,Amt,15% ',
1 => 'Group 2,Good, SWITCHING, Switching, Description 1, "Switching Note 1
Switching Note 2",4,130,210,15,Amt,15%',
2 => 'Group 1,Service,LICENCE,Licence Description 1,Licence Note 1,2,200,400,5,Pct,15%',
)
Admittedly, this technique will not do well if you your text has any escaped doubled quotes -- but then AterLux's answer will suffer in the same fashion.
Alternatively, if you didn't want to rely on the double quoting substrings AND your new rows always start with Group
then a space then an integer then a comma, then you could go for: (Demo)
var_export(preg_split('~\R\h*(?=Group \d+,)~', $text, 0, PREG_SPLIT_NO_EMPTY));
Upvotes: 0
Reputation: 4654
instead of splitting try to group them by regular expression:
<?php
$s = 'Group 1,Good,LEADS,"Leads Description 1
Leads Description 2","Note 1
Note 2",1,100,210,10,Amt,15%
Group 2,Good, SWITCHING, Switching, Description 1, "Switching Note 1
Switching Note 2",4,130,210,15,Amt,15%
Group 1,Service,LICENCE,Licence Description 1,Licence Note 1,2,200,400,5,Pct,15%
';
if (preg_match_all('/([^\r\n"]+|"[^"]*")+/', $s, $pregres)) {
print_r($pregres[0]);
}
output:
Array
(
[0] => Group 1,Good,LEADS,"Leads Description 1
Leads Description 2","Note 1
Note 2",1,100,210,10,Amt,15%
[1] => Group 2,Good, SWITCHING, Switching, Description 1, "Switching Note 1
Switching Note 2",4,130,210,15,Amt,15%
[2] => Group 1,Service,LICENCE,Licence Description 1,Licence Note 1,2,200,400,5,Pct,15%
)
Regex explanation
([^\r\n"]+|"[^"]*")+
Inside parentheses there are two options (separated by or |
):
[^\r\n"]+
- looks for a sequence of characters which is NOT a carriage return, line feed or double quotes. That will look for unquoted sting until it hits any linefeed
"[^"]*"
- looks for a sequence which starts and ends with double quotes and contains any characters inside except for quotes. That will consume whole quoted string including all linefeeds inside the quotes.
They are grouped into parentheses and whole group allowed to repeat (by +
followed the parentheses. This will consume whole string until there is a newline outside quotes.
Repeated Quotes (e.g. "this is a ""quoted"" string"
) also consumed.
Upvotes: 1