Reputation: 2710
I have a dataset data.csv
with around 180 variables (words) and 3000 samples (cases), and it looks like this (excerpt):
I am running decorana
and plotting a cluster using kmeans
and fviz_cluster
:
df <- read.csv("data.csv")
DCA <- decorana (veg = log1p (df))
species.scores <- as.data.frame(scores(DCA, "species"))
geom.text.size = 1
theme.size = (14/5) * geom.text.size
set.seed(123)
km.res <- kmeans(species.scores, 4, nstart = 25)
fviz_cluster(km.res, geom = "text", data = species.scores, labelsize = 4)
This results in a satisfying cluster graph:
I wonder if it would be possible to layer the samples on top of this variable cluster? This will then help to indicate which samples are positioned in which cluster.
Any suggestions on how to achieve something like that?
Upvotes: 2
Views: 60
Reputation: 886938
If we need to convert values to 0, can multiply with a logical vector so that FALSE
-> 0
will return 0 and other values (TRUE
-> 1
) return the original vector (assuming it is numeric)
library(dplyr)
df %>%
mutate(Calculate = Period * Value) %>%
group_by(ID) %>%
mutate(Calculate = Calculate * !(row_number() == n() & Value > 10)) %>%
ungroup
-output
# A tibble: 5 × 4
ID Period Value Calculate
<dbl> <dbl> <dbl> <dbl>
1 1 1 10 10
2 1 2 12 24
3 1 3 11 0
4 5 1 4 4
5 5 2 6 12
Upvotes: 1
Reputation: 25313
A possible solution, where Calculate
is determined in the first mutate
(therefore, outside if_else
), which can correspond to a very complicated calculation, as you declare you are needing:
library(tidyverse)
ID <- c(1, 1, 1, 5, 5)
Period <- c(1,2,3,1,2)
Value <- c(10,12,11,4,6)
df <- data.frame(ID, Period, Value)
df %>%
mutate(Calculate = Period * Value) %>%
group_by(ID) %>%
mutate(Calculate = if_else(row_number() == n() & Value > 10, 0, Calculate)) %>%
ungroup
#> # A tibble: 5 × 4
#> ID Period Value Calculate
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 10 10
#> 2 1 2 12 24
#> 3 1 3 11 0
#> 4 5 1 4 4
#> 5 5 2 6 12
Upvotes: 3