Reputation: 11
I have this example dataset.
data WORK.EXAMPLE;
infile datalines delimiter=',' truncover;
input test_date date9. event_text :$100. date_of_event:date9. ALLE:$12.;
format test_date date_of_event ddmmyyd8.;
datalines4;
01JAN2020,event1,01JAN2020, method1
01JAN2020,event1,01JAN2020,method1
01JAN2020,event1,01JAN2020,method1
01JAN2020,event1,01JAN2020,method1
01JAN2020,event1,01JAN2020,method2
02JAN2020,event2,02JAN2020,method2
02JAN2020,event2,02JAN2020,method2
02JAN2020,event2,02JAN2020,method2
03JAN2020,.,.,.
03JAN2020,.,.,.
04JAN2020,event3,04JAN2020,method2
04JAN2020,event3,04JAN2020,method2
04JAN2020,event3,04JAN2020,method2
04JAN2020,event3,04JAN2020,method1
04JAN2020,event3,04JAN2020,method1
06JAN2020,.,.,.
06JAN2020,.,.,.
07JAN2020,.,.,.
07JAN2020,.,.,.
08JAN2020,event4,08JAN2020,method1
08JAN2020,event4,08JAN2020,method1
08JAN2020,event4,08JAN2020,method1
09JAN2020,event5, 09JAN2020,method1
09JAN2020,event5, 09JAN2020,method1
09JAN2020,event5, 09JAN2020,method1
09JAN2020,event5, 09JAN2020,method1
09JAN2020,event5, 09JAN2020,method1
;;;;
I wish to make the following plot, where CASES is the actual number of test based on test_date in Y-axis. The X-axis should be the specific dates based on dates_of_events. The event_text should be placed above every event date. And a curve that illustrates the number of test over time.
Upvotes: 0
Views: 380
Reputation: 27546
When you have lots of dates, labeling the data points becomes unreadable. You could eventually want a stacked vbar or some sort of gradient or plot
SAS
Example plots from 'large' data constructed from San Francisco covid testing data. The construction mimics what I believe your Example data is.
%if not %sysfunc(exist(work.sfcovid,data)) %then %do;
filename sfcsv temp;
proc http
url='https://data.sfgov.org/api/views/nfpa-mg4g/rows.csv?accessType=DOWNLOAD'
out=sfcsv;
proc import out=sfcovidtesting datafile=sfcsv dbms=csv;
run;
%end;
data sftests(label='real sf covid data slightly mangled for hans question');
call streaminit(30122020);
set sfcovidtesting;
test_date = specimen_collection_date;
* make 85% of the data not missing;
if rand('uniform') < 0.85 then do;
id+1;
length event_text $10;
event_text = cats('event_',id);
event_date = test_date;
end;
else do;
call missing (event_text, event_date);
end;
methods = rand('integer',3);
do _n_ = 1 to tests;
length method $10;
if event_date then
method = scan('antigen antibody molecular', rand('integer',methods));
output;
end;
format test_date event_date date9.;
keep test_date event_text event_date method;
run;
* pre plot summarization for series and needle;
proc sql;
create table counts as select
test_date, count(event_date) as count
from sftests
group by test_date;
create table methods as
select distinct test_date, method
from sftests
;
data method_lists(keep=test_date methods);
do until (last.test_date);
set methods;
by test_date;
length methods $35;
methods = catx('*',methods,method);
end;
run;
data forplot;
merge counts method_lists;
by test_date;
if count=0 then call missing(count);
run;
ods html file='plot.html';
proc sgplot data=forplot;
title 'SERIES plot from presummarized data';
series x=test_date y=count / break;
run;
proc sgplot data=forplot;
title 'SERIES plot from presummarized data';
series x=test_date y=count / break datalabel=methods splitchar='*';
where test_date between '01mar2020'd and '01apr2020'd-1;
run;
proc sgplot data=forplot;
title 'NEEDLE plot from presummarized data';
needle x=test_date y=count / datalabel=methods splitchar='*';
where test_date between '01mar2020'd and '01apr2020'd-1;
run;
proc sgplot data=sftests;
title 'VBAR plot from raw data';
vbar test_date / group=method;
where test_date between '01mar2020'd and '01apr2020'd-1;
run;
ods html close;
Sample SGPLOT outputs
You could also do a SGPLOT plot where the point or needle color corresponds to the mix (or even weighted mix) of methods found in alle
.
Upvotes: 1
Reputation: 869
In R, you can use the ggplot2
package. With this package, you are pretty unlimited on what you can do. So yes! Is definitely possible to build your plot in R. Although I did not understand exactly what you want, here is an code example in R (using ggplot2) of what I think you want.
Code to produce the data:
x <- (101:300)/10
df <- data.frame(
date = seq.Date(as.Date("2020-01-01"), by = "day", length.out = 200),
cases = (x^3) - (3*x)
)
df$important_breaks <- cut(
df$cases,
breaks = c(1000, 5000, 10000, 20000, 26910),
labels = c("break1", "break2", "break3", "break4")
)
Code of the plot:
library(ggplot2)
ggplot(df) +
geom_area(
aes(x = date, y = cases, fill = important_breaks)
) +
geom_line(
aes(x = date, y = cases),
color = "black",
size = 1
) +
theme(legend.position = "bottom") +
annotate(
geom = "text",
x = as.Date("2020-02-10"),
y = 17000,
label = "A very important note\nabout an event",
family = "serif",
size = 13/.pt
) +
geom_curve(
aes(x = as.Date("2020-02-10"), xend = as.Date("2020-02-20"), y = 14000, yend = 1500),
arrow = arrow(length = unit(0.03, "npc"))
)
This is just a template, and you probably want to change colors, add more notes, so you might need to tweak a lot of my code, but is possible to do what you want with ggplot2
package. I strongly recommend to visit R graph galley to see proper examples of how powerful ggplot2
and R are with respect to graphics.
Upvotes: 1
Reputation: 63434
What you're describing is a basic line plot; however, depending on details you may need some additional work.
Here's the basic plot:
proc sgplot data=example;
vline date_of_event/group=alle datalabel=event_text;
xaxis type=time;
run;
group
makes two lines, datalabel
assigns the event text labels, and xaxis
makes it show all dates in between the lowest and highest (or you can tell it what range to use).
However, this may not handle the zero results as you would want. The break
option sometimes fixes this for you, but I don't think it does here. Instead, you may need to presummarize your data first.
You could use something like this:
proc means data=example nway completetypes ;
class date_of_event alle;
output out=example_Sum n=;
run;
proc sgplot data=example_Sum;
vline date_of_event/group=alle response=_FREQ_ ;
xaxis type=time;
run;
You'd have to re-merge the event_text
on though, as that would not work well with completetypes
. You'd also need to update the dataset to have date_of_event
appear in each row at least once - if it's totally missing, this doesn't really work.
Upvotes: 0