Reputation: 95
I'm looking to upload a ".txt" file into SAS so that I can search the contents for specific characters and words to analyse. The text file in question is poorly formatted so would ideally to have one column with each word being a new observation like:
TEXT
1 Hello
2 World
Currently I'm downloading the file into SAS but there's lots of spaces and it has multiple words per observation.
data mylib.textimport;
infile "../TEXTTEST.txt" dlm="' ', ',', '.'";
input __text__ $char300. ;
run;
Could anyone help me with how to put every new word into a new column?
Thanks in advance. :)
Upvotes: 0
Views: 574
Reputation: 51621
If you want read the file "word by word" then just tell SAS what characters you consider to be delimiters and use FLOWOVER option to read the words. So if you wanted to treat spaces, commas, periods, quotes, tabs, linefeeds and carriage returns as word delimiters your program could look like this.
data want;
dlm=' ,."''' || '090A0D'x;
infile "../TEXTTEST.txt" dlm=dlm flowover;
length word $300 ;
input word @@ ;
run;
Upvotes: 2
Reputation: 12465
I would TRANSLATE
the characters you want out into spaces and then loop over the remaining, outputting each word.
Here's some test data
data have;
format line $200.;
input ;
line = _infile_;
datalines;
This is some, test.text
How,about,this cheesey.cheese?
;
run;
Here is a DATA Step to loop through and output what you are looking for:
data want(keep=word);
format word $200.;
set have;
line = translate(line," ",",."); /*convert , and . to space*/
n = countw(line);
/*Loop through the words and output*/
do i=1 to n;
word = scan(line,i);
output;
end;
run;
TRANSLATE
converts the characters in the 3rd argument into the characters in the second. This of the string as an array. It does this replacement for each value in the array.
As this example shows, you probably want to think about other punctuation.
Upvotes: 1