zboy21
zboy21

Reputation: 9

How to calculate for average by year ranges?

I have a dataset with many fields. I am trying to summarize the "price" data by averages for year ranges. For example:

Tried:

proc sql;
select avg(price) as avg_price
FROM summary
WHEN year between 1995 and 2000;
quit;

The code above did not work. Can you please help me with the code (please add to proc and quit or anything else I need, I am new to SAS/SQL)

enter image description here

Upvotes: 0

Views: 198

Answers (4)

Richard
Richard

Reputation: 27498

In SQL you can group by a case statement or by a computed variable (aka column). The averaging function in SQL is MEAN

Example grouping by computed column:

data have;
  do date = '01jan1900'd to '31dec2020'd;
    year = year(date);
    yearChar = put(year,4.);
    price = exp ((date - '01jan1940'd) / (365*12) );
    output;
  end;
  format date yymmdd10.;
run;

proc sql;
  create table want as 
  select
    case 
      when year between 1900 and 1924 then '1900 to 1924'
      when year between 1925 and 1949 then '1925 to 1949'
      when year between 1950 and 1974 then '1950 to 1974'
      when year between 1975 and 1999 then '1975 to 1999'
      when year between 2000 and 2017 then '2000 to 2017'
      else 'out of range'
    end
    as years
  , mean (price) as average_price
  from have
  group by years
  having years not in ('out of range')
;

will create a data set such as

years ($12)     average_price (double)
1900 to 1924       0.120
1925 to 1949       0.967
1950 to 1974       7.777
1975 to 1999      62.546
2000 to 1917     345.873

For the case of the year variable being a character type, you will need to convert the value to a numeric and use the converted value in the between expression involving numbers.

Example:

YearChar is a character column named containing the year values. The input function converts character strings into a numeric value (if possible). The question mark ? prevents log messages when the conversion fails (such as when year is **** or UNKN)

proc sql;
  create table want as 
  select
    case 
      when input(yearChar,?4.) between 1900 and 1924 then '1900 to 1924'
      when input(yearChar,?4.) between 1925 and 1949 then '1925 to 1949'
      when input(yearChar,?4.) between 1950 and 1974 then '1950 to 1974'
      when input(yearChar,?4.) between 1975 and 1999 then '1975 to 1999'
      when input(yearChar,?4.) between 2000 and 2017 then '2000 to 2017'
      else 'out of range'
    end
    as years
  , mean (price) as average_price
  from have
  group by years
  having years not in ('out of range')
;

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269503

I assume you really mean 1920-1924, 1925-1930, and so on, so each year is calculated once.

You can use group by and arithmetic:

proc sql;
    select floor(year / 5) * 5 as from_year, 
           avg(price) as avg_price
    from summary
    group by floor(year / 5);   
quit;

If you want the end year as well:

proc sql;
    select floor(year / 5) * 5 as from_year, 
           floor(year / 5) * 5 + 4 as to_year, 
           avg(price) as avg_price
    from summary
    group by floor(year / 5);   
quit;

Upvotes: 1

SAS2Python
SAS2Python

Reputation: 1297

The error you get seem to indicate that the variable year is a string rather than numeric. The following conversion should help

  PROC SQL;
    SELECT mean(price) as average FROM have 
    WHERE 1995 <= input(year,8.) <= 2000 ;
  quit;

Upvotes: 0

ScaisEdge
ScaisEdge

Reputation: 133360

If you need an year avg then you need group by year

select year, avg(price) as avg_price
FROM summary
WHERE  year between 1995 and 2000
group by year;

or for customn year range a simple way is a union

   select  'from 1940 to 1960', avg(price)
   from summary
   WHERE  year between 1940 and 1960
   union 
   select  'from 1960 to 1980', avg(price)
   from summary
   WHERE  year between 1960 and 1980
   union 
   select  'from 1980 to 2000', avg(price)
   from summary
   WHERE  year between 1980 and 2000

Upvotes: 1

Related Questions