Reputation: 680
I have data which doesn't appear to have consistent spacings or positioning. It looks like:
1675 C Street , Suite 201
Anchorage AK 99501
61.205475 -149.886882
600 Azalea Road
Mobile AL 36609
30.656824 -88.148781
1601 Harbor Bay Parkway , Suite 150
Alameda CA 94502
37.726114 -122.240546
1900 Point West Way, Suite 270
Sacramento CA 95815
38.5994175 -121.4315844
3600 Wilshire Blvd., Suite 1500
Los Angeles CA 90010
34.06153 -118.303463
From this I'd like to extract the street address, city name, state, zip code, lat, and long. I thought the following code would work, but it produces very weird results.
data voa;
input Address $50.;
input City $ State $ Zip;
input Latitude Longitude;
datalines;
I think the issue comes from the fact that there isn't consistent spacing or positioning of the elements.
Upvotes: 1
Views: 90
Reputation: 9109
Your data will work fine using LIST input you just need to add the "look for double delimiter option" & to CITY plus it need to be a bit longer $16 or so.
input City &$16. State $ Zip;
Upvotes: 3
Reputation: 9569
In the absence of consistent delimiters or fixed width fields, this is easier to do using scan
:
data want;
infile cards truncover;
length STATE $2 CITY $32;
input Address $50.;
input;
ZIP = input(scan(_INFILE_, -1),5.);
STATE = scan(_INFILE_, -2);
CITY = trim(substr(_INFILE_,1,index(_INFILE_,STATE) - 1));
input Latitude Longitude;
cards;
1675 C Street , Suite 201
Anchorage AK 99501
61.205475 -149.886882
600 Azalea Road
Mobile AL 36609
30.656824 -88.148781
1601 Harbor Bay Parkway , Suite 150
Alameda CA 94502
37.726114 -122.240546
1900 Point West Way, Suite 270
Sacramento CA 95815
38.5994175 -121.4315844
3600 Wilshire Blvd., Suite 1500
Los Angeles CA 90010
34.06153 -118.303463
;
run;
Upvotes: 1