Reputation: 805
I would like to create a SQL INSERT query with a large CSV file I have. The problem is the names are not parsed out. They are listed as "John Doe" or "John B Doe" instead of "John" and "Doe" because I differentiate between first and last name (I am not concerned with a middle initial).
Another issue is the phone number - Most are listed as "555 555 5555", some as not even finished ("555 555"), and some are "5555555555" or "15555555555". Where do I even begin? I have done some Googling and came up short on this specific issue. I am not picky in the language. However, I am most familiar with C# or PHP. I could also probably accomplish it in C++ or BASIC. My question, really, is where to begin. Thanks.
SQL fields: fname, lname, address, city, state, zip, phone(#-###-###-####), email
CSV example: "John M Doe", "156 Blue St", "Nashville", "TN", "23434", "(555) 555-5555", "[email protected]"
CSV example2: "John Doe", "156 Blue St", "Nashville", "TN", "23434", "555 555-5555", "[email protected]"
CSV example3: "John & Jill Doe", "156 Blue St", "Nashville", "TN", "23434", "5555555555", "[email protected]"
Upvotes: 0
Views: 543
Reputation: 32532
if you wanna go the php route...
You can parse the file one row a ta time with with fgetcsv (look at the doc example, it shows example of opening the file and looping through each line).
Most of the columns look like a straight forward thing. Based on your examples the only real problem columns are the name and phone numbers.
name:
One way to get the first and last name and disregard a middle initial or anything else is this:
$name = "John Smith";
$name = explode(" ",$name);
$fname = array_shift($name);
$lname = array_pop($name);
This doesn't account for your "John & Jill Doe" example (will give you "John" and "Doe"), but you're gonna have to draw the line somewhere.. too much arbitrary-ness going on there. For example "John and Jill Doe","John H. & Jill M. Doe" etc..
Phone number:
Easiest way to handle this is to just strip all non-numbers from it:
$phone = "(555) 555 555";
$phone = preg_replace('~[^0-9]~','',$phone);
Then you can format it how you want. On that note.. you mentioned people not entering in enough numbers, e.g. "555-5555" (no country/area code) or "555-555-555" (no country code). Again, there's not much you can do about that, short of just picking random numbers to fill in the blanks. I suppose if you are feeling really ambitious, you can look into a 3rd party service to attempt to get the area code based on city/state values.. But if not, you can start with parsing it right-to-left and then deciding what to do with the blanks. For example, assume "5555555" is really "555-5555" with no country or area code.
Upvotes: 0
Reputation: 398
Assuming you are parsing a CSV to import the data into a database, I would do a bulk insert into an intermediary table and then parse the fields from that table before committing to the real table(s). If you try to read each row in the CSV and parse in the client, you could end up using a use a lot of memory and it will be slow, although there are lots of .net libraries that will help you do this. You can search TheCodePlex for more.
Here is a link on how to use Bulk Insert
http://technet.microsoft.com/en-us/library/ms175915.aspx
To parse from the insert table into the real tables you can use c# or t-sql. You'll just have to loop through and parse each field according to your rules. There are probably a millions way to do that. I would just start experimenting. You might look into using RegEx:
http://regexlib.com/?AspxAutoDetectCookieSupport=1
Or
Parsing numeric strings (or something like this -- my Google search was c# parse string) http://msdn.microsoft.com/en-us/library/xbtzcc4w.aspx
Upvotes: 0