fly36
fly36

Reputation: 83

how to count distinct value over two dimension using SAS

I have a dataset looks like the following. This dataset contains four variable Country name Country, company ID Company, Year and Date.

Country  Company  Year  Date 
-------  -------  ----  ----
A        1        2000     2000/01/02
A        1        2001     2001/01/03
A        1        2001     2001/07/02
A        1        2000     2001/08/03
B        2        2000     2001/08/03
C        3        2000     2001/08/03

I know how to count number of distinct company in each country. I did it using the following code.

proc sql;
   create table lib.count as
   select country, count(distinct company) as count 
   from lib.data
   group by country;
quit;

My problem is how to count the number of distinct company-Years in each country. Essentially i want to know how many different company or same company in different year. If there are two observation for the same company in the same year, I want to count it as 1 different value. If same company have two observation in differeny year I want to count it as two different value. I want the output looks like the following (one number per country):

Country  No. firm_year
A        2
B        1
C        1

Can anyone can teach me how to do it please.

Upvotes: 1

Views: 1553

Answers (5)

user5693820
user5693820

Reputation: 29

 proc sort data=lib.data out=temp nodupkey; 
   by country company year; 
 run;


 data firm_year(keep=country cnt_fyr);
   set out;
   by country company year
   retain cnt_fyr;
   if first.country then cnt_fyr=1;
   else cnt_fyr+1;
   if last.country;
 run;

Upvotes: 1

Dominic Comtois
Dominic Comtois

Reputation: 10401

You need nested queries, as @DaBigNikoladze hinted at...

  • An "internal" query which will generate a list of distinct combinations of Country + Company + Year;
  • An "external" query which will count how many rows per country are present in the internal query.

Generate dataset

data have;
  informat Country $1.
           Company 1.
           Year 4.
           Date YYMMDD10.;
  format Date YYMMDDs10.;
  input country company year date;
  datalines;
A 1 2000 2000/01/02
A 1 2001 2001/01/03
A 1 2001 2001/07/02
A 1 2000 2001/08/03
B 2 2000 2001/08/03
C 3 2000 2001/08/03
;

Execute query

PROC SQL;
  CREATE TABLE want AS 
    SELECT country, Count(company) AS Firm_year 
      FROM (SELECT DISTINCT country, company, year FROM have) 
    GROUP  BY country; 
QUIT;

Results

Country Firm_year 
A       2 
B       1 
C       1 

Upvotes: 2

DaBigNikoladze
DaBigNikoladze

Reputation: 661

A quick method is to concatenate all the variables you want to compare, creating a new variable. Something like:

data data_mod;
    set data;
    length company_year $ 20;
    company_year= cats(company,year);
run;

Then you can run your proc sql with count(distinct company_year).

Upvotes: 3

gowerc
gowerc

Reputation: 1099

Guess i'm a bit confused as to what you are expecting the result to look like. Here is an sql method that gets the same result as posted by the other answer so far.

data temp;
    attrib Country length = $10;
    attrib Company length = $10;
    attrib Year length = $10;
    attrib Date length = $10;
    input Country $ Company $ Year $ Date $;
    infile datalines delimiter = '@';
    datalines;
    A@1@x@x1@
    A@1@x@x2@
    B@2@x@x1@
    C@3@x@x3@
    ;
run;


proc sql;
   create table temp2  as
   select country,  count(distinct Date) as count 
   from temp
   group by country,  company;
quit;

Upvotes: 0

andrey_sz
andrey_sz

Reputation: 751

The answer for your first question is:

data lib.count(keep=country companyCount);
   set lib.data;
   by country;
   retain companyList '';
   retain companyCount 0;
   if first.country then do;
      companyList = company;
      companyCount = 1;
   end;
   else do;
      if ^index(companyList, company) then do;
         companyList = cats(companyList,',',company);
         companyCount + 1;
      end;
   end;
   if last.country then output;
run;

The resutl is:

Country  companyCount
-------  ------------
A        2
B        1
C        1

Similary you will take the number of distinct company-Years in each country.

Upvotes: 0

Related Questions