accibio
accibio

Reputation: 547

Removing duplicate values in a column of a dataframe in R without dropping the rows associated with the duplicated values

I have a dataframe which has 6 rows and 6 columns.

data <- data.frame(
Unit = c("A", "A", "B", "B", "C", "C"),
P1 = c(1:6),
P2 = c(1:6),
P3 = c(1:6),
P4 = c(1:6),
P5 = c(1:6),
stringsAsFactors = FALSE)

enter image description here

I need to retain only the first occurrences of unique values in the column. How do I achieve this in R? This is what my output should look like.

enter image description here

Upvotes: 1

Views: 54

Answers (2)

akrun
akrun

Reputation: 886938

We may use duplicated to replace with ""

library(dplyr)
data %>%
    mutate(Unit = replace(Unit, duplicated(Unit), ""))
  Unit P1 P2 P3 P4 P5
1    A  1  1  1  1  1
2       2  2  2  2  2
3    B  3  3  3  3  3
4       4  4  4  4  4
5    C  5  5  5  5  5
6       6  6  6  6  6

Or with base R

data$Unit[duplicated(data$Unit)] <- ""

Upvotes: 1

Bruno
Bruno

Reputation: 4151

This is quite simple, supposing that the order is already fixed

library(tidyverse)

data <- data.frame(
  Unit = c("A", "A", "B", "B", "C", "C"),
  P1 = c(1:6),
  P2 = c(1:6),
  P3 = c(1:6),
  P4 = c(1:6),
  P5 = c(1:6),
  stringsAsFactors = FALSE)


data %>% 
  group_by(Unit) %>% 
  mutate(Unit = if_else(row_number() == 1,Unit,"")) %>% 
  ungroup()
#> # A tibble: 6 x 6
#>   Unit     P1    P2    P3    P4    P5
#>   <chr> <int> <int> <int> <int> <int>
#> 1 "A"       1     1     1     1     1
#> 2 ""        2     2     2     2     2
#> 3 "B"       3     3     3     3     3
#> 4 ""        4     4     4     4     4
#> 5 "C"       5     5     5     5     5
#> 6 ""        6     6     6     6     6

Created on 2021-08-10 by the reprex package (v2.0.1)

Upvotes: 1

Related Questions