Reputation: 87
I have a df with IDs and values and I would like to generate a density plot for every unique ID and check about the distributions if its normal or skewed.There are also NA values and i am not sure how to treat them. Should i just remove them and create the density plot? Also the range of the values between the IDs is different.
| ID | Values |
| -------- | ------- |
| F1 | 45 |
| F1 | 56 |
| F1 | NA |
| F1 | 68 |
| F1 | 55 |
| F2 | 23 |
| F2 | 44 |
| F2 | 34 |
| F2 | NA |
| F2 | NA |
| F2 | 34 |
| F3 | 5055 |
| F3 | 4567 |
| F3 | NA |
| F3 | 4789 |
| F3 | 5567 |
| F3 | 6002 |
| F4 | 9045 |
| F4 | 9500 |
| F4 | 9760 |
| F4 | NA |
| F4 | 9150 |
Please help as I am beginner in the visualizations
Upvotes: 0
Views: 166
Reputation: 46908
You don't need to remove the NAs, they are ignored in the plot. You have at most 5 values per ID in your dataset so a density plot is not so useful. So for your example above, we can take the log10 and try a density:
ggplot(df,aes(x = Values,y=ID)) + geom_jitter(width=0.1) + scale_x_log10()
A stripchart might be more useful:
ggplot(df,aes(x = Values,y=ID)) + geom_jitter(width=0.1) + scale_x_log10()
Upvotes: 1