How to convert text to columns and ignore some delimiters

I've tried to copy/paste a price list from a PDF into an excel sheet, but everything pastes into a single column (A), rather than converting it to the appropriate number of columns.

Using text to columns with a space delimiter doesn't work because many of the products have multiple spaces in the product name.

I've got about 300 lines with varying lengths of item description. There should be 4 columns: Item number, description, qty and price. But all 4 columns may have varying numbers of delimiters and characters.

Example:

MGB123 Brand 20 Mil Business Card Magnet 500 .225
NNR1221 Brand 15 Mil Custom Shaped Marker and Clip 250 4.123
PQXRF123 Brand 10 Mil Pen 1000 .221
etc...

I've used:

=MID(A1,FIND("~~~~~",SUBSTITUTE(A1," ","~~~~~",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))))+1,LEN(A1))

to get the price separated, but how would I alter the formula to get the next to last column?

Upvotes: 0

Views: 1000

Answers (2)

Ron Rosenfeld
Ron Rosenfeld

Reputation: 60224

Here are formulas that should work on earlier versions of Excel also:

Item Number: =LEFT(A2,FIND(" ",A2)-1)
Description: =MID(A2,FIND(" ",A2)+1,FIND(CHAR(1), SUBSTITUTE(A2," ",CHAR(1),LEN(A2)-LEN(SUBSTITUTE(A2," ",""))-1))-FIND(" ",A2))
Qty:         =--TRIM(MID(SUBSTITUTE(A2," ",REPT(" ",99)),99*2,99))
Price:       =--TRIM(RIGHT(SUBSTITUTE(A2," ",REPT(" ",99)),99))

The double unary at the beginning of Qty and Price are to convert the string to a number.

enter image description here

Upvotes: 1

norie
norie

Reputation: 9857

You can use these 4 formulas to extract the item number, name, quantity and price.

=FILTERXML("<t><s>"& SUBSTITUTE(A1," ","</s><s>")&"</s></t>", "//s[1]")

=TEXTJOIN(" ",TRUE, FILTERXML("<t><s>"& SUBSTITUTE(A1," ","</s><s>")&"</s></t>", "//s[position()>1 and position()<last()-1]"))

=FILTERXML("<t><s>"& SUBSTITUTE(A1," ","</s><s>")&"</s></t>", "//s[last()-1]")

=FILTERXML("<t><s>"& SUBSTITUTE(A1," ","</s><s>")&"</s></t>", "//s[last()]")

If TEXTJOIN isn't available in your version of Excel you could try using CONCAT.

=CONCAT(FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>", "//s[position()>1 and position()<last()-1]")&" ")

Which, again depending on your version of Excel, might need to be entered as an array formula using CTRL+SHIFT+ENTER.

If CONCAT is unavailable try this,

=MID(A1,FIND(" ",A1)+1,FIND(FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>", "//s[last()-1]"),A1)-FIND(" ",A1)+2)

though I'm sure there's probably a better way.:)

Upvotes: 2

Related Questions