desmond.carros
desmond.carros

Reputation: 362

How to import single columned multiple observations data in SAS/R?

There is a given dummy dataset. I am trying to import it into the SAS environment and i also tried it in RStudio as well. But i was unable to grab the correct output. the dataset provided is as follows:

1    "name, age, salary, zipcode"
2    "A, 1, 100, 10010 B, 2, 200, 10011 C, 3, 300, 10012 D, 4, 400, 10014"

i copied the data from a CSV file. The line 1 was in the First cell of the CSV and Line 2 was in the second cell just below the first cell.

PS : 10010 is the zipcode where as B is the Name that is B is the starting of next observation and similiarly C is the next observation and D is the next observation.

The Desired output is:

Name   Age   Salary   Zipcode
 A      1     100      10010
 B      2     200      10011
 C      3     300      10012
 D      4     400      10014

i am stuck with this problem since a day ago. I am just able to move around the dataset in order to move the observations. Moreover the double qouted values are making this problem very complex. A working Solution on SAS or R would work great.

EDIT: The following code has been tried by me in SAS. Thanks SMW for pointing. :)

data new;
length Name $2 Age 8. Salary 8. Zipcode 8.;
infile 'book1.csv' dsd dlm = ',' firstobs = 2 LRECL =17 ;
input Name $ Age Salary Zipcode @@;
run;

Regards

Upvotes: 1

Views: 282

Answers (2)

PereG
PereG

Reputation: 1846

Import the column names and prepare it

nam <- read.table("csv.csv", header=F, nrow=1, stringsAsFactors=F)
nam <- gsub(",", "", nam)
nam <- nam[2]                             # resolve " " in original data 
nam <- strsplit(nam, split=" ")           #

Capitalize the colnames

library(Hmisc)                   # edited
nam <- capitalize(nam)           # edited

Import the remaining file

dd <- read.table("csv.csv", skip=1, sep="", stringsAsFactors=F)

Reorder the file, extract first column

col_1 <- gsub(",", "", as.vector(dd[seq(from=1, to=16,by=4)])[1,])

Extract the remaining columns

col_all <- as.data.frame(t(matrix(dd, ncol=4)))

Clean commas

col_all <- apply(apply(col_all[-1], 2, gsub, patt=",", replace=""), 2, as.numeric)

and set the final object

data <- as.data.frame(cbind(col_1, col_all))
names(data) <- nam
data  # ouput

  Name Age Salary Zipcode
1    A   1    100   10010
2    B   2    200   10011
3    C   3    300   10012
4    D   4    400   10014

Edited: Capitalize only with base functions. Avoid library(Hmisc)and the following code line.

nam <- unlist(lapply(nam, function(x) {                    
   paste(toupper(substring(x, 1, 1)), substring(x, 2, max(nchar(nam))), sep="")
}))

Upvotes: 3

Tom
Tom

Reputation: 51621

You can ask SAS to treat comma, blank and quote as delimiters and use trail @@ to allow it to read multiple observations from one line.

Let's build your example data file.

filename example temp;
data _null_;
  file example;
  put '"name, age, salary, zipcode"'
    / '"A, 1, 100, 10010 B, 2, 200, 10011 C, 3, 300, 10012 D, 4, 400, 10014"'
 ;
run;

And now read it.

data want;
  length Name $2 Age 8. Salary 8. Zipcode 8.;
  infile example dlm = ' ,"' firstobs = 2 ;
  input Name $ Age Salary Zipcode @@;
run;

Upvotes: 0

Related Questions