Reputation: 3198
I need to extract the lines of text without header and upto the line ,,,,,Total
INPUT
Sr.No ,Description of Goods ,HSN ,,City ,Unit ,,Unit Price ,Amount (in BP)
1 ,29550762R3C0-Top- ST22E360 ,84439990 ,,1.00 ,Numbers ,,5000.00 ,5000.00
2 ,29550762R3C0-Bot- ST22E361 ,84439990 ,,1.00 ,Numbers ,,5000.00 ,5000.00
3 ,2945486609-G2 - Bot Stencil - ST22F199 ,84439990 ,,1.00 ,Numbers ,,5000.00 ,5000.00
4 ,2945457600-S1 - Top Stencil - ST22F224 ,84439990 ,,1.00 ,Numbers ,,5000.00 ,5000.00
5 ,2945457600-G2 - Bot Stencil - ST22F225 ,84439990 ,,1.00 ,Numbers ,,5000.00 ,5000.00
,,,,,Total ,,,25000.00 IN
,,,,,Discount ,,,0.00 3
,,,,Total Amount ,Before Tax ,,,"₹ 25,000.00 "
,,,,,IGST 0% ,,,₹ 0.00
,,,,Total Amount ,After Tax: ,,,"₹ 25,000.00 "
EXPECTED OUTPUT
1 ,29550762R3C0-Top- ST22E360 ,84439990 ,,1.00 ,Numbers ,,5000.00 ,5000.00
2 ,29550762R3C0-Bot- ST22E361 ,84439990 ,,1.00 ,Numbers ,,5000.00 ,5000.00
3 ,2945486609-G2 - Bot Stencil - ST22F199 ,84439990 ,,1.00 ,Numbers ,,5000.00 ,5000.00
4 ,2945457600-S1 - Top Stencil - ST22F224 ,84439990 ,,1.00 ,Numbers ,,5000.00 ,5000.00
5 ,2945457600-G2 - Bot Stencil - ST22F225 ,84439990 ,,1.00 ,Numbers ,,5000.00 ,5000.00
I have tried with: =REGEXEXTRACT(A10,"(\n.*)")
, The number of lines are not known in advance and can vary from cell to cell. Some may have one and some have 5 to 6.
Upvotes: 0
Views: 118
Reputation: 133538
With your shown samples Only, please try following solutions. Here is the Online demo for shown regex.
Google sheet formula with regex: THANKS to @Jvdv for such a nice formula, credit to him for throwing light on this one.
=FLATTEN(SPLIT(REGEXEXTRACT(A1,"((?:\n[0-9]+\s+,[0-9a-zA-Z]+-.*?-.*?,[0-9]+(?:.*?,){2}[0-9]+(?:\.[0-9]+)?\s+,Numbers\s+(?:.*?,){2}[0-9]+(?:\.[0-9]+)?\s+,[0-9]+(?:\.[0-9]+)?\b\s?)+)"),CHAR(10),1,1))
Only regex solution: Where we can use regex inside REGEXEXTRACT
function but since it generates only 1 capturing group so keeping whole regex into one capturing group and then we need to split it with new lines, which is why ABOVE solution is kept as an whole solution.
(^[0-9]+\s+,[0-9a-zA-Z]+-.*?-.*?,[0-9]+(?:.*?,){2}[0-9]+(?:\.[0-9]+)?\s+,Numbers\s+(?:.*?,){2}[0-9]+(?:\.[0-9]+)?\s+,[0-9]+(?:\.[0-9]+)?\b)
Explanation: Adding detailed explanation for above regex.
( ##starting main capturing group.
^[0-9]+\s+, ##Matching digits from starting of value followed by spaces followed by comma.
[0-9a-zA-Z]+- ##Matching alphanumeric 1 or more occurrences followed by -
.*?-.*?, ##Using Lazy match till - followed by another lazy match till comma.
[0-9]+ ##Matching digits 1 or more occurrences here.
(?:.*?,){2} ##In a non-capturing group using lazy match till comma with its 2 occurrences.
[0-9]+ ##Matching digits 1 or more occurrences here.
(?:\.[0-9]+)? ##Followed by dot followed by 1 or more digits to handle float matches keeping it optional.
\s+, ##Matching 1 or more occurrences of spaces followed by comma.
Numbers\s+ ##Matching string Numbers followed by spaces 1 or more occurrences.
(?:.*?,){2} ##In a non-capturing group matching till comma and making its 2 occurrences.
[0-9]+ ##Matching digits 1 or more occurrences here.
(?:\.[0-9]+)? ##Followed by dot followed by 1 or more digits to handle float matches keeping it optional.
\s+, ##Matching 1 or more occurrences of spaces followed by comma.
[0-9]+(?:\.[0-9]+)? ##Matching 1 or more digits followed by dot followed by 1 or more digits to handle float matches keeping it optional.
\b ##Using word boundary to make sure previous match it exact match.
) ##closing opened capturing group in starting.
Upvotes: 3
Reputation: 75870
My two cents, assuming:
Formula in A3
:
=QUERY(FLATTEN(SPLIT(A1,CHAR(10))),"where Col1 matches '^\d.+'")
Upvotes: 1
Reputation: 206151
I would simply match the lines which start with digits,space,comma till the end of line:
/^\d+ *,.*$/gm
^ Start of line
\d+ One or more digits
* Zero or more spaces
, Comma
.* Any character
$ End of line
/gm Flags: Global; Multi line
Upvotes: 0