How to perform a multi-conditional replace in dplyr?

I have a dataset data.csv with around 180 variables (words) and 3000 samples (cases), and it looks like this (excerpt):

enter image description here

I am running decorana and plotting a cluster using kmeans and fviz_cluster:

df <- read.csv("data.csv")

DCA <- decorana (veg = log1p (df))

species.scores <- as.data.frame(scores(DCA, "species"))

geom.text.size = 1
theme.size = (14/5) * geom.text.size

set.seed(123)
km.res <- kmeans(species.scores, 4, nstart = 25)
fviz_cluster(km.res, geom = "text", data = species.scores, labelsize = 4)

This results in a satisfying cluster graph:

clusters

I wonder if it would be possible to layer the samples on top of this variable cluster? This will then help to indicate which samples are positioned in which cluster.

Any suggestions on how to achieve something like that?

Upvotes: 2

Views: 60

Answers (2)

akrun
akrun

Reputation: 886938

If we need to convert values to 0, can multiply with a logical vector so that FALSE -> 0 will return 0 and other values (TRUE -> 1) return the original vector (assuming it is numeric)

library(dplyr)
df %>% 
  mutate(Calculate = Period * Value) %>% 
  group_by(ID) %>% 
  mutate(Calculate = Calculate * !(row_number() == n() & Value > 10)) %>% 
  ungroup

-output

# A tibble: 5 × 4
     ID Period Value Calculate
  <dbl>  <dbl> <dbl>     <dbl>
1     1      1    10        10
2     1      2    12        24
3     1      3    11         0
4     5      1     4         4
5     5      2     6        12

Upvotes: 1

PaulS
PaulS

Reputation: 25313

A possible solution, where Calculate is determined in the first mutate (therefore, outside if_else), which can correspond to a very complicated calculation, as you declare you are needing:

library(tidyverse)

ID <- c(1, 1, 1, 5, 5)
Period <- c(1,2,3,1,2)
Value <- c(10,12,11,4,6)
df <- data.frame(ID, Period, Value)

df %>% 
  mutate(Calculate = Period * Value) %>% 
  group_by(ID) %>% 
  mutate(Calculate = if_else(row_number() == n() & Value > 10, 0, Calculate)) %>% 
  ungroup

#> # A tibble: 5 × 4
#>      ID Period Value Calculate
#>   <dbl>  <dbl> <dbl>     <dbl>
#> 1     1      1    10        10
#> 2     1      2    12        24
#> 3     1      3    11         0
#> 4     5      1     4         4
#> 5     5      2     6        12

Upvotes: 3

Related Questions