Reputation: 1
"6/23/2015 8:55:36 AM,6/23/2015 7:57:55 AM,test,A B,""C, D "",E-MA,F,Personal G,G one,test - TWO THREE,""I LIKE APPLE"",""ONE, TWO"",FCB,6/27/2015 - 6/27/2016,6/23/2015,BIZ,Personal,MA,NY,Personal,Group,""NYC Ins. Companies"",,,""$NYC NY-MA, (Group)"",""$NYC NY-MA, (Group)"",,,,"
The data looks like this but with 10k rows. I was first trying to use PROC IMPORT but it didn't recognize the comma(,) as a delimiter in the file. Then I tried DATA STEP INFILE but still didn't work out the problem.
Does anyone have any experience importing files with text qualifiers?
Thank you.
update
VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8 VAR9 VAR10 VAR11 VAR12 VAR13 VAR14 VAR15 VAR16 VAR17 VAR18 VAR19 VAR20 VAR21 VAR22 VAR23 VAR24 VAR25 VAR26 VAR27 VAR28 VAR29 VAR30
6/23/2015 8:55|6/23/2015 7:57| test| A |B C, D | E-MA |F |Personal G |G one| test - TWO THREE| I LIKE APPLE |ONE, TWO |FCB 6/27/2015 - 6/27/2016 |6/23/2015| BIZ| Personal |MA |NY| Personal |Group NYC Ins. Companies | | |$NYC NY-MA, (Group)| $NYC NY-MA, (Group) ||||
Upvotes: 0
Views: 2116
Reputation: 9569
I think you might have better luck with proc import
if you make an intial pass through the file to strip out the extraneous double quotes:
data _null_;
infile "original_file.csv" lrecl = 32000;
file "new_file.csv";
input;
_infile_ = tranwrd(substr(_infile_,2,length(_infile_) - 1),'""','"');
put _infile_;
run;
The idea is pretty simple - read in the whole line, remove the first and last characters (assuming that these are always double quotes), and then replace double double quotes with double quotes.
This may cause some further problems if any of your text fields are actually supposed to contain double quotes, but otherwise it should generate a file that should be slightly easier to import directly, either via proc import or by using the dsd
option on an appropriate infile statement.
Upvotes: 0