pluke
pluke

Reputation: 4346

Suppressing data from a graph in R

I have a dataset, d, that contains personally identifiable data, I have the dataset putting an X for all values that are suppressed:

column1     column2     column3
*           FSM         X
*           Male        2.5
*           Female      X
A           FSM         6
A           Male        10.3
A           Female      11.7
B           FSM         14.8
B           Male        21.5
B           Female      25.3

I want to plot this with an X above the bars in a bar plot, where data has been suppressed, such as:

desired output

My code is:

p <- ggplot(d, aes(x=column1, y=column3, fill=column2)) + 
  geom_bar(position=position_dodge(), stat="identity", colour="black") +
  geom_text(aes(label=column2),position= position_dodge(width=0.9), vjust=-.5) 
  scale_y_continuous("Percentage",breaks=seq(0, max(d$column3), 2)))

But of course, it can't plot 'X' on the graph and says:

Error: Discrete value supplied to continuous scale

How can I get the bar plotting to ignore the 'X' and still add the label if it's present?

Data dump:

structure(list(column1 = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L), .Label = c("*", 
"A", "B", "C", "D", "E", "U"), class = "factor"), column2 = structure(c(1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L, 1L, 2L, 3L), .Label = c("FSM", "Male", "Female"), class = "factor"), 
    column3 = structure(c(21L, 1L, 2L, 18L, 3L, 4L, 7L, 12L, 
    14L, 16L, 15L, 13L, 10L, 9L, 8L, 11L, 6L, 5L, 20L, 19L, 17L
    ), .Label = c("1.93889541715629", "1.97444831591173", "10.1057579318449", 
    "11.7305458768873", "12.7758420441347", "14.4535840188014", 
    "14.8471615720524", "18.5830429732869", "19.9764982373678", 
    "20.0873362445415", "20.9606986899563", "21.5628672150411", 
    "24.1579558652729", "25.3193960511034", "25.7931844888367", 
    "29.2576419213974", "5.45876887340302", "6.11353711790393", 
    "6.16921269095182", "6.98689956331878", "X"), class = "factor")), .Names = c("column1", 
"column2", "column3"), row.names = c(NA, -21L), class = "data.frame")

I 'm happy to print out 0 instances where there are 0 instances, but in the case of data suppression, I want to make it clear that data has been suppressed by printing out a 'X', but the bar will also show 0 instances

Upvotes: 0

Views: 173

Answers (1)

Spacedman
Spacedman

Reputation: 94182

First convert the height to numeric which gives NA for censored values. Then create a label column based on that. Then you need a column of zeroes for the y coordinate of the labels.

> d$column3=as.numeric(as.character(d$column3))
Warning message:
NAs introduced by coercion 
> d$column4 = ifelse(is.na(d$column3),"X","")
> d$y=0

Then:

> p <- ggplot(d, aes(x=column1, y=column3, fill=column2))
> p + geom_bar(position=position_dodge(), stat="identity",
   colour="black") + 
 geom_text(aes(label=column4,x=column1,y=y),
   position=position_dodge(width=1), vjust=-0.5)

Giving:

labelled Bar

Its a variant on labelling a geom_bar with the value of the bar. Almost a dupe.

Upvotes: 3

Related Questions