FionaS
FionaS

Reputation: 75

Sort same data into columns

EDIT: Thanks for all the responses everyone. I'm going to go ahead and try and write rules to cover as many of the cases as I can, and either manually extract or try to right more rules to cover everything else.

I am trying to sort the same "types" of data into the same columns. Essentially, I get a data dump where a bunch of data (year, company name, person name, IO number, PO number, project description, and a bunch of comments) dumps into one single column, like this:

enter image description here

The ideal end result would be sorting so that same type of data in the same columns, i.e. all years in column A, all IOs in column B, all POs in column C, all person names in column D, all company names in column E, and whatever is left is dumped into a "comments" section in column F.

I've written a macro that employs the SUBSTITUTE function so that it goes through this string and substitutes all dashes and backslashes with commas, then separates based on the comma delimitor, then re-pastes the text as plain-text. This works fairly well, except for in the occasional case where there are dashes in the name of a company or a backslash to indicate two people who own that IO/PO or when all of the data is entered in without any delimitor such as: 2012 Company project title IO ##### PO #### Person Name.

So here is what I am asking: 1. Is there a better way to parse the data than I am doing now? How can I accommodate for the exceptions such as a dash in the company name or a string where there are no dashes or backslashes, only spaces? 2. Once I have parsed all of this data and separated it into separate columns like so: enter image description here how do I sort it so that the same type of information is in the same column?

Any help would be greatly appreciated. Please let me know if anything was unclear.

Upvotes: 1

Views: 945

Answers (2)

pnuts
pnuts

Reputation: 59450

Along the lines of at @Werner “You can’t make a silk purse …” Obviously the solution is to lean on whoever is responsible for the garbage in to ensure that your source data is in better shape. However I guess you are looking for a workaround. From your example, some ‘tiding’ is possible. Eg sort on ColumnB and where 2012 is in ColumnC exchange the contents of B and C for that row. Then sort on ColumnD and do much that same for D and E. If ColumnF contains Quote insert a blank cell and shift to the right. If ColumnF is blank exchange contents of that row with ColumnD. Move ColumnD to the end. Select anything before Quote in ColumnF and remove it to ColumnE if that is empty, otherwise to ColumnH. The result should look something like:

SO18131636 example

-rather better than I was expecting and I’d guess about the limit of what could reasonably be programmed.

Upvotes: 1

stenci
stenci

Reputation: 8481

Welcome to StackOverflow!

  1. If the text follows clear rules, like a separator as "-" or "," you can use the Split() function to get an array of tokens. If the text doesn't follow any rule it's impossible. Very likely you are in the middle, where most of the texts follow the rules. For the other texts, you need to massage your code and try to find new rules and check them with... see below.

  2. Create a few functions IsYear(), IsPO(), IsCompany() that return True if the content is recognized. The functions could be as simple as IsYear = Text Like "20##" or could contain many tests. Then you make a function that checks each cell of each row, and sorts if required.

I'm sorry I can't give you anything more than some generic advice, but this is a very open question for a very challenging problem.

I hope this gets you started.

Upvotes: 1

Related Questions