Reputation: 474
I have created a plot, in which I want to color code values. One subset of values will be one color, another subset a different color, and the remaining values another color.
The subsets work like this: I have sorted the data frame based on one column. I have created a subset for the bottom 10 values, and the top 10 values. I want to color code the values of column NY, corresponding to those top 10 or bottom 10 values in column Total. So my NY values won't be sorted, but they will be corresponding to those sorted values in Total.
The only problem is, there are duplicates of certain values in the NY column, that do not lie within the top 10 or bottom 10. For example: 5 lies within the bottom 10 subset. But there is also another instance of 5, not within the bottom 10.
So rather than color coding only the bottom 10, my code color codes all instances of those values in the bottom 10. For example, 12 or 13 values are colored rather than 10.
I may have much more code in my plot() call then I actually need, but it works for me other than this problem I am facing:
upper10<-tail(statedata[order(Total),],10)
lower10<-head(statedata[order(Total),],10)
plot(State,NY,type="p",pch=ifelse(NY %in% lower10$NY,
0, ifelse(NY %in% upper10$NY, 1, 2)),
col=ifelse(NY %in% lower10$NY,
"green3", ifelse(NY %in% upper10$NY, "red", "black")),
main="New York")
Basically, what I'm trying to do is make sure that ONLY the bottom 10 values are green. This code changes all instances of those values within the entire data frame to green, because there are duplicates. So now I am stuck.
Apologies if this is confusing. If it's too confusing, I can try to further clarify.
EDIT: Added some data:
DET NY CHI Total
2.6 9.3 23.0 15.8
5.0 6.3 25.3 32.1
5.9 5.0 31.5 18.4
7.1 11.9 18.7 13.8
7.5 11.8 17.3 3.0
4.1 1.0 10.7 8.0
10.1 48.8 4.7 45.0
This is just a snippet. I sorted Total, and then based on the values in the sorted lower10 or upper10, color code the values in NY.
Upvotes: 1
Views: 254
Reputation: 574
I'm sure there are many more efficient ways to accomplish this; one way to do this without substantially changing your main code is to try to work with indices in the ifelse
statement inside the plot
function. I slightly changed the dataframe and the subsets to generate a quick and dirty reproducible example.
The script is pretty much the same as your original code except that it tries to locate the match by cross checking the corresponding indices in the lower5
and upper5
subsets. This can be done using the rownames
function as seen below.
Run it and let me know if this is what you were looking for and if you need further clarification.
#Define a dataframe for demonstration purposes
df <- data.frame(DET=1:20,NY=21:40,CHI=41:60,Total=100:81)
#Subset the lower and upper 5 values after sorting the dataframe (df) by the Total column
lower5 <- tail(df[order(df$Total),],5)
upper5 <- head(df[order(df$Total),],5)
#Plot the NY column from df and color code the data points if the indices in the NY column matches those of lower5 and upper5 subsets
plot(df$NY,
type="p",
pch=ifelse(rownames(df) %in% rownames(lower5),0, ifelse(rownames(df) %in% rownames(upper5), 1, 2)),
col=ifelse(rownames(df) %in% rownames(lower5),'green3', ifelse(rownames(df) %in% rownames(upper5), 'red', 'black')),
main="New York")
Upvotes: 2