hansgans
hansgans

Reputation: 11

Time series graph with events in SAS or R

I have this example dataset.

data WORK.EXAMPLE;
  infile datalines delimiter=',' truncover; 
  input test_date date9.  event_text :$100.  date_of_event:date9.  ALLE:$12.;
    format test_date  date_of_event  ddmmyyd8.;
datalines4;
01JAN2020,event1,01JAN2020, method1
01JAN2020,event1,01JAN2020,method1
01JAN2020,event1,01JAN2020,method1
01JAN2020,event1,01JAN2020,method1
01JAN2020,event1,01JAN2020,method2
02JAN2020,event2,02JAN2020,method2
02JAN2020,event2,02JAN2020,method2
02JAN2020,event2,02JAN2020,method2
03JAN2020,.,.,.
03JAN2020,.,.,.
04JAN2020,event3,04JAN2020,method2
04JAN2020,event3,04JAN2020,method2
04JAN2020,event3,04JAN2020,method2
04JAN2020,event3,04JAN2020,method1
04JAN2020,event3,04JAN2020,method1
06JAN2020,.,.,.
06JAN2020,.,.,.
07JAN2020,.,.,.
07JAN2020,.,.,.
08JAN2020,event4,08JAN2020,method1 
08JAN2020,event4,08JAN2020,method1  
08JAN2020,event4,08JAN2020,method1  
09JAN2020,event5, 09JAN2020,method1 
09JAN2020,event5, 09JAN2020,method1 
09JAN2020,event5, 09JAN2020,method1 
09JAN2020,event5, 09JAN2020,method1 
09JAN2020,event5, 09JAN2020,method1 
;;;;

I wish to make the following plot, where CASES is the actual number of test based on test_date in Y-axis. The X-axis should be the specific dates based on dates_of_events. The event_text should be placed above every event date. And a curve that illustrates the number of test over time.

Upvotes: 0

Views: 380

Answers (3)

Richard
Richard

Reputation: 27546

When you have lots of dates, labeling the data points becomes unreadable. You could eventually want a stacked vbar or some sort of gradient or plot

SAS

Example plots from 'large' data constructed from San Francisco covid testing data. The construction mimics what I believe your Example data is.

%if not %sysfunc(exist(work.sfcovid,data)) %then %do;
  filename sfcsv temp;

  proc http 
    url='https://data.sfgov.org/api/views/nfpa-mg4g/rows.csv?accessType=DOWNLOAD'
    out=sfcsv;

  proc import out=sfcovidtesting datafile=sfcsv dbms=csv;
  run;
%end;


data sftests(label='real sf covid data slightly mangled for hans question');
  call streaminit(30122020);

  set sfcovidtesting;

  test_date = specimen_collection_date;

  * make 85% of the data not missing;
  if rand('uniform') < 0.85 then do;
    id+1;
    length event_text $10;
    event_text = cats('event_',id);
    event_date = test_date;
  end;
  else do;
    call missing (event_text, event_date);
  end;

  methods = rand('integer',3);
  do _n_ = 1 to tests;
    length method $10;
    if event_date then 
      method = scan('antigen antibody molecular', rand('integer',methods));

    output;
  end; 

  format test_date event_date date9.;

  keep test_date event_text event_date method;
run;

* pre plot summarization for series and needle;

proc sql;
  create table counts as select
  test_date, count(event_date) as count
  from sftests
  group by test_date;

  create table methods as 
  select distinct test_date, method 
  from sftests
  ;

data method_lists(keep=test_date methods);
  do until (last.test_date);
    set methods;
    by test_date;

    length methods $35;
    methods = catx('*',methods,method);
  end;
run;

data forplot;
  merge counts method_lists;
  by test_date;
  if count=0 then call missing(count);
run;



ods html file='plot.html';
proc sgplot data=forplot;
  title 'SERIES plot from presummarized data';
  series x=test_date y=count / break;
run;

proc sgplot data=forplot;
  title 'SERIES plot from presummarized data';
  series x=test_date y=count / break datalabel=methods splitchar='*';
  where test_date between '01mar2020'd and '01apr2020'd-1;
run;

proc sgplot data=forplot;
  title 'NEEDLE plot from presummarized data';
  needle x=test_date y=count / datalabel=methods splitchar='*';
  where test_date between '01mar2020'd and '01apr2020'd-1;
run;

proc sgplot data=sftests;
  title 'VBAR plot from raw data';
  
  vbar test_date / group=method;
  where test_date between '01mar2020'd and '01apr2020'd-1;
run;
ods html close;

Sample SGPLOT outputs

enter image description here enter image description here enter image description here enter image description here

You could also do a SGPLOT plot where the point or needle color corresponds to the mix (or even weighted mix) of methods found in alle.

Upvotes: 1

Pedro Faria
Pedro Faria

Reputation: 869

In R, you can use the ggplot2 package. With this package, you are pretty unlimited on what you can do. So yes! Is definitely possible to build your plot in R. Although I did not understand exactly what you want, here is an code example in R (using ggplot2) of what I think you want.

Code to produce the data:

x <- (101:300)/10

df <- data.frame(
  date = seq.Date(as.Date("2020-01-01"), by = "day", length.out = 200),
  cases = (x^3) - (3*x)
)

df$important_breaks <- cut(
  df$cases,
  breaks = c(1000, 5000, 10000, 20000, 26910),
  labels = c("break1", "break2", "break3", "break4")
)

Code of the plot:

library(ggplot2)

ggplot(df) +
  geom_area(
    aes(x = date, y = cases, fill = important_breaks)
  ) +
  geom_line(
    aes(x = date, y = cases),
    color = "black",
    size = 1
  ) +
  theme(legend.position = "bottom") +
  annotate(
    geom = "text",
    x = as.Date("2020-02-10"),
    y = 17000,
    label = "A very important note\nabout an event",
    family = "serif",
    size = 13/.pt
  ) + 
  geom_curve(
    aes(x = as.Date("2020-02-10"), xend = as.Date("2020-02-20"), y = 14000, yend = 1500),
    arrow = arrow(length = unit(0.03, "npc"))
  )

This is just a template, and you probably want to change colors, add more notes, so you might need to tweak a lot of my code, but is possible to do what you want with ggplot2 package. I strongly recommend to visit R graph galley to see proper examples of how powerful ggplot2 and R are with respect to graphics.

Plot generated: enter image description here

Upvotes: 1

Joe
Joe

Reputation: 63434

What you're describing is a basic line plot; however, depending on details you may need some additional work.

Here's the basic plot:

proc sgplot data=example;
  vline date_of_event/group=alle datalabel=event_text;
  xaxis type=time;
run;

group makes two lines, datalabel assigns the event text labels, and xaxis makes it show all dates in between the lowest and highest (or you can tell it what range to use).

However, this may not handle the zero results as you would want. The break option sometimes fixes this for you, but I don't think it does here. Instead, you may need to presummarize your data first.

You could use something like this:

proc means data=example nway completetypes ;
  class date_of_event alle;
  output out=example_Sum n=;
run;

proc sgplot data=example_Sum;
  vline date_of_event/group=alle response=_FREQ_  ;
  xaxis type=time;
run;

You'd have to re-merge the event_text on though, as that would not work well with completetypes. You'd also need to update the dataset to have date_of_event appear in each row at least once - if it's totally missing, this doesn't really work.

Upvotes: 0

Related Questions