Reputation: 83
I have a dataset looks like the following. This dataset contains four variable Country name Country
, company ID Company
, Year
and Date
.
Country Company Year Date
------- ------- ---- ----
A 1 2000 2000/01/02
A 1 2001 2001/01/03
A 1 2001 2001/07/02
A 1 2000 2001/08/03
B 2 2000 2001/08/03
C 3 2000 2001/08/03
I know how to count number of distinct company in each country. I did it using the following code.
proc sql;
create table lib.count as
select country, count(distinct company) as count
from lib.data
group by country;
quit;
My problem is how to count the number of distinct
company-Years in each country. Essentially i want to know how many different company or same company in different year. If there are two observation for the same company in the same year, I want to count it as 1 different value. If same company have two observation in differeny year I want to count it as two different value. I want the output looks like the following (one number per country):
Country No. firm_year
A 2
B 1
C 1
Can anyone can teach me how to do it please.
Upvotes: 1
Views: 1553
Reputation: 29
proc sort data=lib.data out=temp nodupkey;
by country company year;
run;
data firm_year(keep=country cnt_fyr);
set out;
by country company year
retain cnt_fyr;
if first.country then cnt_fyr=1;
else cnt_fyr+1;
if last.country;
run;
Upvotes: 1
Reputation: 10401
You need nested queries, as @DaBigNikoladze hinted at...
data have;
informat Country $1.
Company 1.
Year 4.
Date YYMMDD10.;
format Date YYMMDDs10.;
input country company year date;
datalines;
A 1 2000 2000/01/02
A 1 2001 2001/01/03
A 1 2001 2001/07/02
A 1 2000 2001/08/03
B 2 2000 2001/08/03
C 3 2000 2001/08/03
;
PROC SQL;
CREATE TABLE want AS
SELECT country, Count(company) AS Firm_year
FROM (SELECT DISTINCT country, company, year FROM have)
GROUP BY country;
QUIT;
Country Firm_year
A 2
B 1
C 1
Upvotes: 2
Reputation: 661
A quick method is to concatenate all the variables you want to compare, creating a new variable. Something like:
data data_mod;
set data;
length company_year $ 20;
company_year= cats(company,year);
run;
Then you can run your proc sql
with count(distinct company_year)
.
Upvotes: 3
Reputation: 1099
Guess i'm a bit confused as to what you are expecting the result to look like. Here is an sql method that gets the same result as posted by the other answer so far.
data temp;
attrib Country length = $10;
attrib Company length = $10;
attrib Year length = $10;
attrib Date length = $10;
input Country $ Company $ Year $ Date $;
infile datalines delimiter = '@';
datalines;
A@1@x@x1@
A@1@x@x2@
B@2@x@x1@
C@3@x@x3@
;
run;
proc sql;
create table temp2 as
select country, count(distinct Date) as count
from temp
group by country, company;
quit;
Upvotes: 0
Reputation: 751
The answer for your first question is:
data lib.count(keep=country companyCount);
set lib.data;
by country;
retain companyList '';
retain companyCount 0;
if first.country then do;
companyList = company;
companyCount = 1;
end;
else do;
if ^index(companyList, company) then do;
companyList = cats(companyList,',',company);
companyCount + 1;
end;
end;
if last.country then output;
run;
The resutl is:
Country companyCount
------- ------------
A 2
B 1
C 1
Similary you will take the number of distinct company-Years in each country.
Upvotes: 0