jpugliese
jpugliese

Reputation: 261

Removing duplicate labeling and adjust labeling colours with dotchart

I was wondering if it's possible to remove duplicate labelling (by collapsing duplicate labels to a centralized position) and to adjust the dot colouring without changing the label colour.

I have 3 layers of groups: the categories, which are divided into outcomes, and for each outcome, I have results for groups "1" and "2".

To visualize my issue, the following code:

categories <- rep(c("X", rep("Z", 3), rep("Y", 2), rep("W", 2)), 2)
outcomes <- rep(c("A", "B", "C", "D", 
                  "E", "F", "G", "H"), 2)
treatment <- c(rep("1", 8), rep("2", 8))
coefficients_nap <- rep(0.1, 8)
coefficients_ns <- rep(0.1, 8)

coefficients <- c(coefficients_nap, coefficients_ns)


data = data.frame(categories, outcomes, treatment, coefficients)

data <- data[order(categories, outcomes),]

dotchart(data$coefficients, labels = data$outcomes, groups = data$categories, main = "Overview Table", cex=.7, pch=17, gcolor = "black", color = rep(c("darkgreen", "purple")),
         xlim=c(-0.2, 0.2))

Produces the following chart, where labels are duplicated and I haven't found a way to dissociate the colour of the points and the labels:

Graph

Upvotes: 0

Views: 268

Answers (1)

Martin
Martin

Reputation: 594

ad 1.: remove duplicate labels

Replace those outcome labels NOT to be displayed with NA.

solution 1: duplicated() finds all duplicated labels, which are then replaced by NA (i.e. nothing plotted). Works assuming that duplicates are always in a run as in your example (i.e. it would also remove duplicated A, B and C in this example: A, B, A, C, B, C).

solution 2 + 3: work also with non-consecutive duplicates, and duplicate values are only removed from plot if in a run (consecutive duplicates).

# Solution 1:  easy solution assuming duplicates are always in a row:
# duplicated( , fromLast=T) will find all duplicated elements, starting from last element
data$outcomes_1 <- data$outcomes
data$outcomes_1[duplicated(data$outcomes_1, fromLast = T)] <- NA
dotchart(data$coefficients, labels = data$outcomes_1, groups = data$categories, main = "Overview Table", cex=.7, pch=17, gcolor = "black", color = rep(c("darkgreen", "purple")),
         xlim=c(-0.2, 0.2))
duplicated(c(1,2,3,3,4,3,2,9)) # BUT: does not work with non-consecutive duplicates (as in this example line)


# Solution 2: more generic
rle(as.character(data$outcomes))$lengths # [1] 2 2 2 2 2 2 2 2   # data$outcomes is factor => convert to character
# then loop through each list element (vector), replace all but (first or) last vector element by NA (i.e. not displayed in diagram)


# Solution 3: works with non-consecutive duplicates, and shorter
# splits outcomes by occurrences of same character, delete each last character from sub-vector resembling consecutive occurrences from last element.
data$outcomes_3 <- data$outcomes
# data$outcomes_3 <- c(1,2,3,2,2,2,4,1,2,3,6,6,6,6,6,4) # data$outcomes # example to show non-consecutive runs
(s <- split(data$outcomes, cumsum(c(1, diff(as.numeric(data$outcomes)) == 0))))
if (length(s) > 1) for (i in 1:(length(s)-1))  s[[i]][length(s[[i]])] <- NA; data$outcomes_3 <- unlist(s)
dotchart(data$coefficients, labels = data$outcomes_3, groups = data$categories, main = "Overview Table", cex=.7, pch=17, gcolor = "black", color = rep(c("darkgreen", "purple")),
         xlim=c(-0.2, 0.2))

ad 2: colors:

not sure why you need this. AFAIK randomly different colors between outcomes and symbols are not possible unless you adjust the dotchart() function:

  • enter dotchart (without brackets)
  • copy-paste the displayed code to R editor and assign to a new function name
  • change color designations as you like

Upvotes: 1

Related Questions