Fred-LM
Fred-LM

Reputation: 330

Compare rows and replace value if there is a difference

First of all: Happy New Year :)

I'm struggling with a loop so I'm now seeking your help.

Below is a short dummy:

df <- data.frame(name = c("a","a","b","b","c","d"), type = c(1,1,2,2,3,4), area = c("a","b","a","a","b","b"), length = c(10), power = c(10, 100))

I'd like to compare each unique combination of name, type and area, and see if length and power vary or not. If they do not, I want to keep their value; if they do, I want to replace their value by 'Unknown'. In the example above, there would thus only be a replacement for name = b: length would remain '10' but power would become 'Unknown'. As a result, the resulting dataframe would only have five rows.

That seems like a rather simple loop to come up with, but I haven't succeeded so far... do you have any idea?

Cheers,

Fred

Upvotes: 2

Views: 430

Answers (2)

Niek
Niek

Reputation: 1624

I think you don't need a for loop but can use duplicated. First look up the rows that are have the same name, type, area and length but do not have the same power value. Replace one of the power values with Unknown

df[which(duplicated(df[1:4]) & !duplicated(df[1:5])),'power'] <- 'Unkown'

Next create a new dataframe that discards the other row

df2 <- df[which(!duplicated(df[1:4],fromLast = T)),] 

Output:

> df2
  name type area length  power
1    a    1    a     10     10
2    a    1    b     10    100
4    b    2    a     10 Unkown
5    c    3    b     10     10
6    d    4    b     10    100

EDIT: Following additional requests from the OP here's a dplyr solution that works for more general cases.

# New dataframe; containing multiple duplicates
df3 <- data.frame(name = c("a","a","b","b","b","c","d"),
type = c(1,1,2,2,2,3,4), area = c("a","b","a","a","a","b","b"), 
length = rep(10,7), 
power = c(10, 100, 10, 100,100,10,100))


df3 %>% 
  group_by(name, type, area) %>% 
  mutate(length = ifelse(n() > 1 && var(length) != 0, "Unknown", paste0(length)),
    power = ifelse(n() > 1 && var(power) != 0, "Unknown", paste0(power)))

The function first groups by name, type and area. Next, it checks if there is more than 1 row, if this is true it checks if values vary, if both are true it replaces all values by "Unknown".

Output:

# A tibble: 7 x 5
# Groups:   name, type, area [5]
  name   type area  length power  
  <fct> <dbl> <fct> <chr>  <chr>  
1 a         1 a     10     10     
2 a         1 b     10     100    
3 b         2 a     10     Unknown
4 b         2 a     10     Unknown
5 b         2 a     10     Unknown
6 c         3 b     10     10     
7 d         4 b     10     100

Upvotes: 2

tmfmnk
tmfmnk

Reputation: 39858

With dplyr you can do:

df %>%
 group_by(name, type, area) %>%
 mutate(length = ifelse(length != first(length), "Unknown", paste0(length)),
        power = ifelse(power != first(power), "Unknown", paste0(power)))

  name   type area  length power  
  <fct> <dbl> <fct> <chr>  <chr>  
1 a        1. a     10     10     
2 a        1. b     10     100    
3 b        2. a     10     10     
4 b        2. a     10     Unknown
5 c        3. b     10     10     
6 d        4. b     10     100 

It checks whether the values are the same as for the first row for a given combination of "name", "type" and "area". If not, it fills the rows with "Unknown".

Upvotes: 1

Related Questions