Reputation: 1236
While I checked this link, I am still struggling in getting a formatted sas file into R as I have a formatted .sas7bdat file (attached here) but when I tried to import it into R
I noticed that all formats were lost.
I used 2 different codes:
## Code 1:
##========
library(haven)
data <- read_sas("C:/Users/mmr2011/OneDrive/R codes/df_nsclc1.sas7bdat", NULL)
## Code 2:
##========
library(sas7bdat)
data("sas7bdat.sources")
data<-read.sas7bdat("C:/Users/mmr2011/OneDrive/R codes/df_nsclc1.sas7bdat", debug= F)
table(data$SEX) # gives me 1 and 2 instead of males and females
# 1 2
#880916 799960
# Then I tried this code (as I have sas catalog folder named format so I added that to my prior code; formats.sas7bcat) as follows
#===============================================================================
data<- read_sas("C:/Users/mmr2011/OneDrive/OneDrive/R codes/df_nsclc1.sas7bdat", catalog_file = "C:/Users/mmr2011/OneDrive/OneDrive/R codes/formats.sas7bcat")
# table(data$SEX)
# 1 2
#50190 66064
While I need them to be as they are in sas as follow
I am using SAS catalog folder in Windows
that is shown below (screenshot No. 5). Also, It is available here
Any advice will be greatly appreciated
Upvotes: 1
Views: 3795
Reputation: 63424
I think the issue you have, most likely, is misunderstanding how R labels work.
When I use the following SAS code:
libname temp 'h:\temp\';
proc format lib=temp;
value sexf
1='Female'
2='Male'
;
value racef
1='Black'
2='Asian'
3='White'
4='Other'
;
value hispf
1='Of Hispanic Origin'
2='Not of Hispanic Origin'
;
quit;
options fmtsearch=(temp);
data temp.rtest;
input sex race hisp;
format sex sexf. race racef. hisp hispf.;
datalines;
1 1 1
2 1 1
1 2 1
2 2 1
1 3 1
2 3 1
1 4 1
2 4 1
1 1 2
2 1 2
1 2 2
2 2 2
1 3 2
2 3 2
1 4 2
2 4 2
;;;;
run;
And then use the following R code:
library(haven)
data <- read_sas("H:/temp/rtest.sas7bdat", catalog_file="H:/temp/formats.sas7bcat")
print(data)
It works as expected - the console prints the labelled text.
# A tibble: 16 x 3
sex race hisp
<dbl+lbl> <dbl+lbl> <dbl+lbl>
1 1 [Female] 1 [Black] 1 [Of Hispanic Origin]
2 2 [Male] 1 [Black] 1 [Of Hispanic Origin]
3 1 [Female] 2 [Asian] 1 [Of Hispanic Origin]
4 2 [Male] 2 [Asian] 1 [Of Hispanic Origin]
5 1 [Female] 3 [White] 1 [Of Hispanic Origin]
6 2 [Male] 3 [White] 1 [Of Hispanic Origin]
7 1 [Female] 4 [Other] 1 [Of Hispanic Origin]
8 2 [Male] 4 [Other] 1 [Of Hispanic Origin]
9 1 [Female] 1 [Black] 2 [Not of Hispanic Origin]
10 2 [Male] 1 [Black] 2 [Not of Hispanic Origin]
11 1 [Female] 2 [Asian] 2 [Not of Hispanic Origin]
12 2 [Male] 2 [Asian] 2 [Not of Hispanic Origin]
13 1 [Female] 3 [White] 2 [Not of Hispanic Origin]
14 2 [Male] 3 [White] 2 [Not of Hispanic Origin]
15 1 [Female] 4 [Other] 2 [Not of Hispanic Origin]
16 2 [Male] 4 [Other] 2 [Not of Hispanic Origin]
However, if I view it in RStudio's viewer by double-clicking on the dataset in the Data pane, it doesn't, and that is what you pasted into the question (a picture of that). I don't believe that's supported (variable labels are, meaning column header labels, but not value labels); if you want to verify that you may want to ask a new question specifically mentioning that, with the code here cleaned up (you're welcome to use my example code).
What you will probably want to do is convert the value labels to factor
s. This can be done a few ways; there is some discussion of why in the labelled package documentation, which is one thing you could use for this, but there are several approaches. Again, this would be a good separate question if you can't figure it out on your own. Factors are how R would typically manage this sort of thing (i.e., categorical variables).
Upvotes: 2