Reputation: 3555
I'd like to separate the variable potato in 2 because the histogram shows 2 normal distributions in one. So I wanted to make 2 categories of my dataset so that I can treat them separately. Here I want to separate the dataset at a value of 12 approximatively.
If possible, is there a way to do this in dplyr? Also, if I have many Species in my first column and I wanted to do this with only one species with specific traits (I have more than the trait Depth in my original dataset), how could I specify a separation of a trait depending on the species and the trait?
To clarify: Is it possible to split the data with the minimum value between the two peaks in this bi-normal sample? Like, split between mean value of peak one and mean value of peak 2?
This is the dataset:
structure(list(Species1 = c("Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes", "Potatoes",
"Potatoes", "Potatoes", "Potatoes", "Potatoes"), Depth = c(10.3,
10.47, 12.48, 9.48, 13.07, 12.25, 10.1, 9.38, 9.04, 11.25, 12.52,
9.96, 10.74, 10.13, 10.88, 12.66, 9.8, 10.7, 9.71, 10.51, 9.67,
9.12, 11.15, 9.82, 10.21, 10.33, 12.06, 9.58, 9.45, 13.79, 12.61,
10.97, 10.98, 11.83, 12.52, 12.48, 10.25, 9.67, 9.58, 11, 11.02,
10.34, 10.09, 12.27, 10.34, 12.5, 10.03, 9.87, 10.38, 10.24,
10.77, 10.36, 10.63, 9.76, 10.11, 8.69, 12.88, 9.86, 10.7, 10.93,
10.26, 12.06, 10.43, 11.39, 10.56, 9.68, 11.42, 9.55, 11.29,
8.69, 12.59, 13.92, 12.31, 10.08, 10.14, 10.21, 12.6, 11.24,
10.72, 12.3, 12.06, 9.64, 9.77, 10.18, 10.78, 10.18, 11.36, 9.69,
12.47, 10.73, 9.12, 9.81, 10.69, 12.39, 10.2, 9.86, 12.79, 9.93,
10.39, 11.63, 10.57, 10.55, 9.09, 11.15, 10.02, 10.94, 10.66,
9.55, 10.29, 12.04, 10.63, 9.17, 9.78, 10.05, 8.75, 10.99, 13.65,
9.63, 9.83, 13.61, 11.53, 12.46, 13.55, 11.71, 11.97, 9.62, 10.29,
11.34, 10.8, 10.35, 9.22, 10.66, 9.52, 13.17, 12.14, 12.48, 12.3,
10.63, 11.01, 10.3, 9.94, 9.67, 11.73, 9.24, 10.55, 9.96, 10.62,
9.21, 10.88, 9.5, 9.92, 9.79, 10.13, 11.82, 9.68, 10.39, 8.99,
8.68, 10.66, 10.01, 13.26, 11.99, 9.89, 10.68, 11.14, 9.63, 10.96,
10.7, 9.83, 9.79, 9.37, 10.21, 7.58, 10.5, 9.09, 11.79, 11.98,
9.81, 9.68, 8.86, 8.9, 9.55, 10.26, 9.83, 10.17, 11.01, 9.95,
9.49, 9.65, 9.64, 10.55, 10.12, 10.78, 9.61, 10.47, 9.81, 10.81,
9.17, 10.75, 12.35, 10.1, 10.29, 12.02, 9.75, 9.84, 10.04, 10.01,
9.95, 9.09, 9.26, 10.89, 10.83, 8.84, 12.11, 9.32, 9.37, 9.01,
10.33, 9.79, 8.51, 9, 10.12, 9.61, 12.59, 9.6, 8.96, 12.03, 9.83,
11.74, 9.41, 9.56, 9.6, 11.4, 12.91, 9.66, 9.67, 9.31, 11.23,
11.02, 9.16, 12.08, 12.16, 8.55, 11.9, 8, 13.56, 9.28, 10.24,
9.6, 12.63, 12.7, 10.17, 10.09, 12.92, 9.69, 10.58, 10.05, 10.36,
9.18)), .Names = c("Species1", "Depth"), row.names = c(NA, -259L
), class = "data.frame")
Upvotes: 0
Views: 194
Reputation: 2570
I'm not sure I got you right, but simply
df1 <- df %>% filter(Depth>12)
df2 <- df %>% filter(Depth<=12)
Splits your Dataset in two at 12 with dplyr.
Adding a new formating Variable, same as Heroka said..
df3 <- df %>% mutate(DepthClass = ifelse(Depth<12,1,2))
Upvotes: 1