Reputation: 9
I have a dataset with many fields. I am trying to summarize the "price" data by averages for year ranges. For example:
Tried:
proc sql;
select avg(price) as avg_price
FROM summary
WHEN year between 1995 and 2000;
quit;
The code above did not work. Can you please help me with the code (please add to proc and quit or anything else I need, I am new to SAS/SQL)
Upvotes: 0
Views: 198
Reputation: 27498
In SQL you can group by
a case
statement or by a computed variable (aka column). The averaging function in SQL is MEAN
Example grouping by computed column:
data have;
do date = '01jan1900'd to '31dec2020'd;
year = year(date);
yearChar = put(year,4.);
price = exp ((date - '01jan1940'd) / (365*12) );
output;
end;
format date yymmdd10.;
run;
proc sql;
create table want as
select
case
when year between 1900 and 1924 then '1900 to 1924'
when year between 1925 and 1949 then '1925 to 1949'
when year between 1950 and 1974 then '1950 to 1974'
when year between 1975 and 1999 then '1975 to 1999'
when year between 2000 and 2017 then '2000 to 2017'
else 'out of range'
end
as years
, mean (price) as average_price
from have
group by years
having years not in ('out of range')
;
will create a data set such as
years ($12) average_price (double)
1900 to 1924 0.120
1925 to 1949 0.967
1950 to 1974 7.777
1975 to 1999 62.546
2000 to 1917 345.873
For the case of the year
variable being a character type, you will need to convert the value to a numeric and use the converted value in the between
expression involving numbers.
Example:
YearChar
is a character column named containing the year values. The input
function converts character strings into a numeric value (if possible). The question mark ?
prevents log messages when the conversion fails (such as when year is ****
or UNKN
)
proc sql;
create table want as
select
case
when input(yearChar,?4.) between 1900 and 1924 then '1900 to 1924'
when input(yearChar,?4.) between 1925 and 1949 then '1925 to 1949'
when input(yearChar,?4.) between 1950 and 1974 then '1950 to 1974'
when input(yearChar,?4.) between 1975 and 1999 then '1975 to 1999'
when input(yearChar,?4.) between 2000 and 2017 then '2000 to 2017'
else 'out of range'
end
as years
, mean (price) as average_price
from have
group by years
having years not in ('out of range')
;
Upvotes: 0
Reputation: 1269503
I assume you really mean 1920-1924, 1925-1930, and so on, so each year is calculated once.
You can use group by
and arithmetic:
proc sql;
select floor(year / 5) * 5 as from_year,
avg(price) as avg_price
from summary
group by floor(year / 5);
quit;
If you want the end year as well:
proc sql;
select floor(year / 5) * 5 as from_year,
floor(year / 5) * 5 + 4 as to_year,
avg(price) as avg_price
from summary
group by floor(year / 5);
quit;
Upvotes: 1
Reputation: 1297
The error you get seem to indicate that the variable year is a string rather than numeric. The following conversion should help
PROC SQL;
SELECT mean(price) as average FROM have
WHERE 1995 <= input(year,8.) <= 2000 ;
quit;
Upvotes: 0
Reputation: 133360
If you need an year avg then you need group by year
select year, avg(price) as avg_price
FROM summary
WHERE year between 1995 and 2000
group by year;
or for customn year range a simple way is a union
select 'from 1940 to 1960', avg(price)
from summary
WHERE year between 1940 and 1960
union
select 'from 1960 to 1980', avg(price)
from summary
WHERE year between 1960 and 1980
union
select 'from 1980 to 2000', avg(price)
from summary
WHERE year between 1980 and 2000
Upvotes: 1