Reputation: 362
There is a given dummy dataset. I am trying to import it into the SAS environment and i also tried it in RStudio as well. But i was unable to grab the correct output. the dataset provided is as follows:
1 "name, age, salary, zipcode"
2 "A, 1, 100, 10010 B, 2, 200, 10011 C, 3, 300, 10012 D, 4, 400, 10014"
i copied the data from a CSV file. The line 1 was in the First cell of the CSV and Line 2 was in the second cell just below the first cell.
PS : 10010 is the zipcode where as B is the Name that is B is the starting of next observation and similiarly C is the next observation and D is the next observation.
The Desired output is:
Name Age Salary Zipcode
A 1 100 10010
B 2 200 10011
C 3 300 10012
D 4 400 10014
i am stuck with this problem since a day ago. I am just able to move around the dataset in order to move the observations. Moreover the double qouted values are making this problem very complex. A working Solution on SAS or R would work great.
EDIT: The following code has been tried by me in SAS. Thanks SMW for pointing. :)
data new;
length Name $2 Age 8. Salary 8. Zipcode 8.;
infile 'book1.csv' dsd dlm = ',' firstobs = 2 LRECL =17 ;
input Name $ Age Salary Zipcode @@;
run;
Regards
Upvotes: 1
Views: 282
Reputation: 1846
Import the column names and prepare it
nam <- read.table("csv.csv", header=F, nrow=1, stringsAsFactors=F)
nam <- gsub(",", "", nam)
nam <- nam[2] # resolve " " in original data
nam <- strsplit(nam, split=" ") #
Capitalize the colnames
library(Hmisc) # edited
nam <- capitalize(nam) # edited
Import the remaining file
dd <- read.table("csv.csv", skip=1, sep="", stringsAsFactors=F)
Reorder the file, extract first column
col_1 <- gsub(",", "", as.vector(dd[seq(from=1, to=16,by=4)])[1,])
Extract the remaining columns
col_all <- as.data.frame(t(matrix(dd, ncol=4)))
Clean commas
col_all <- apply(apply(col_all[-1], 2, gsub, patt=",", replace=""), 2, as.numeric)
and set the final object
data <- as.data.frame(cbind(col_1, col_all))
names(data) <- nam
data # ouput
Name Age Salary Zipcode
1 A 1 100 10010
2 B 2 200 10011
3 C 3 300 10012
4 D 4 400 10014
Edited: Capitalize only with base functions. Avoid library(Hmisc)
and the following code line.
nam <- unlist(lapply(nam, function(x) {
paste(toupper(substring(x, 1, 1)), substring(x, 2, max(nchar(nam))), sep="")
}))
Upvotes: 3
Reputation: 51621
You can ask SAS to treat comma, blank and quote as delimiters and use trail @@ to allow it to read multiple observations from one line.
Let's build your example data file.
filename example temp;
data _null_;
file example;
put '"name, age, salary, zipcode"'
/ '"A, 1, 100, 10010 B, 2, 200, 10011 C, 3, 300, 10012 D, 4, 400, 10014"'
;
run;
And now read it.
data want;
length Name $2 Age 8. Salary 8. Zipcode 8.;
infile example dlm = ' ,"' firstobs = 2 ;
input Name $ Age Salary Zipcode @@;
run;
Upvotes: 0