Mohamed Rahouma
Mohamed Rahouma

Reputation: 1236

How to import a formatted ' .sas7bdat' file into `R` without format deletion?

While I checked this link, I am still struggling in getting a formatted sas file into R as I have a formatted .sas7bdat file (attached here) but when I tried to import it into R I noticed that all formats were lost. I used 2 different codes:

## Code 1:
##========
library(haven)
data <- read_sas("C:/Users/mmr2011/OneDrive/R codes/df_nsclc1.sas7bdat", NULL)

## Code 2:
##========
library(sas7bdat)
data("sas7bdat.sources")

data<-read.sas7bdat("C:/Users/mmr2011/OneDrive/R codes/df_nsclc1.sas7bdat", debug= F)

table(data$SEX) # gives me 1 and 2 instead of males and females
#     1      2 
#880916 799960 

# Then I tried this code (as I have sas catalog folder named format so I added that to my prior code; formats.sas7bcat) as follows
#===============================================================================
data<- read_sas("C:/Users/mmr2011/OneDrive/OneDrive/R codes/df_nsclc1.sas7bdat", catalog_file = "C:/Users/mmr2011/OneDrive/OneDrive/R codes/formats.sas7bcat") 

# table(data$SEX)
#   1     2 
#50190 66064 

heaven imported data non heaven imported data

enter image description here

While I need them to be as they are in sas as follow enter image description here

I am using SAS catalog folder in Windows that is shown below (screenshot No. 5). Also, It is available here enter image description here

Any advice will be greatly appreciated

Upvotes: 1

Views: 3795

Answers (1)

Joe
Joe

Reputation: 63424

I think the issue you have, most likely, is misunderstanding how R labels work.

When I use the following SAS code:

libname temp 'h:\temp\';
proc format lib=temp;
  value sexf
  1='Female'
  2='Male'
  ;
  value racef
  1='Black'
  2='Asian'
  3='White'
  4='Other'
  ;
  value hispf
  1='Of Hispanic Origin'
  2='Not of Hispanic Origin'
  ;
quit;
options fmtsearch=(temp);
data temp.rtest;
  input sex race hisp;
  format sex sexf. race racef. hisp hispf.;
datalines;
1 1 1
2 1 1
1 2 1
2 2 1
1 3 1
2 3 1
1 4 1
2 4 1
1 1 2
2 1 2
1 2 2
2 2 2
1 3 2
2 3 2
1 4 2
2 4 2
;;;;
run;

And then use the following R code:

library(haven)
data <- read_sas("H:/temp/rtest.sas7bdat", catalog_file="H:/temp/formats.sas7bcat")   
print(data)

It works as expected - the console prints the labelled text.

# A tibble: 16 x 3
          sex      race                       hisp
    <dbl+lbl> <dbl+lbl>                  <dbl+lbl>
 1 1 [Female] 1 [Black] 1 [Of Hispanic Origin]    
 2 2 [Male]   1 [Black] 1 [Of Hispanic Origin]    
 3 1 [Female] 2 [Asian] 1 [Of Hispanic Origin]    
 4 2 [Male]   2 [Asian] 1 [Of Hispanic Origin]    
 5 1 [Female] 3 [White] 1 [Of Hispanic Origin]    
 6 2 [Male]   3 [White] 1 [Of Hispanic Origin]    
 7 1 [Female] 4 [Other] 1 [Of Hispanic Origin]    
 8 2 [Male]   4 [Other] 1 [Of Hispanic Origin]    
 9 1 [Female] 1 [Black] 2 [Not of Hispanic Origin]
10 2 [Male]   1 [Black] 2 [Not of Hispanic Origin]
11 1 [Female] 2 [Asian] 2 [Not of Hispanic Origin]
12 2 [Male]   2 [Asian] 2 [Not of Hispanic Origin]
13 1 [Female] 3 [White] 2 [Not of Hispanic Origin]
14 2 [Male]   3 [White] 2 [Not of Hispanic Origin]
15 1 [Female] 4 [Other] 2 [Not of Hispanic Origin]
16 2 [Male]   4 [Other] 2 [Not of Hispanic Origin]

However, if I view it in RStudio's viewer by double-clicking on the dataset in the Data pane, it doesn't, and that is what you pasted into the question (a picture of that). I don't believe that's supported (variable labels are, meaning column header labels, but not value labels); if you want to verify that you may want to ask a new question specifically mentioning that, with the code here cleaned up (you're welcome to use my example code).

What you will probably want to do is convert the value labels to factors. This can be done a few ways; there is some discussion of why in the labelled package documentation, which is one thing you could use for this, but there are several approaches. Again, this would be a good separate question if you can't figure it out on your own. Factors are how R would typically manage this sort of thing (i.e., categorical variables).

Upvotes: 2

Related Questions