Reputation: 330
First of all: Happy New Year :)
I'm struggling with a loop so I'm now seeking your help.
Below is a short dummy:
df <- data.frame(name = c("a","a","b","b","c","d"), type = c(1,1,2,2,3,4), area = c("a","b","a","a","b","b"), length = c(10), power = c(10, 100))
I'd like to compare each unique combination of name
, type
and area
, and see if length
and power
vary or not. If they do not, I want to keep their value; if they do, I want to replace their value by 'Unknown'.
In the example above, there would thus only be a replacement for name
= b: length
would remain '10' but power
would become 'Unknown'. As a result, the resulting dataframe would only have five rows.
That seems like a rather simple loop to come up with, but I haven't succeeded so far... do you have any idea?
Cheers,
Fred
Upvotes: 2
Views: 430
Reputation: 1624
I think you don't need a for loop but can use duplicated
.
First look up the rows that are have the same name
, type
, area
and length
but do not have the same power value. Replace one of the power
values with Unknown
df[which(duplicated(df[1:4]) & !duplicated(df[1:5])),'power'] <- 'Unkown'
Next create a new dataframe that discards the other row
df2 <- df[which(!duplicated(df[1:4],fromLast = T)),]
Output:
> df2
name type area length power
1 a 1 a 10 10
2 a 1 b 10 100
4 b 2 a 10 Unkown
5 c 3 b 10 10
6 d 4 b 10 100
EDIT: Following additional requests from the OP here's a dplyr solution that works for more general cases.
# New dataframe; containing multiple duplicates
df3 <- data.frame(name = c("a","a","b","b","b","c","d"),
type = c(1,1,2,2,2,3,4), area = c("a","b","a","a","a","b","b"),
length = rep(10,7),
power = c(10, 100, 10, 100,100,10,100))
df3 %>%
group_by(name, type, area) %>%
mutate(length = ifelse(n() > 1 && var(length) != 0, "Unknown", paste0(length)),
power = ifelse(n() > 1 && var(power) != 0, "Unknown", paste0(power)))
The function first groups by name, type and area. Next, it checks if there is more than 1 row, if this is true it checks if values vary, if both are true it replaces all values by "Unknown".
Output:
# A tibble: 7 x 5
# Groups: name, type, area [5]
name type area length power
<fct> <dbl> <fct> <chr> <chr>
1 a 1 a 10 10
2 a 1 b 10 100
3 b 2 a 10 Unknown
4 b 2 a 10 Unknown
5 b 2 a 10 Unknown
6 c 3 b 10 10
7 d 4 b 10 100
Upvotes: 2
Reputation: 39858
With dplyr
you can do:
df %>%
group_by(name, type, area) %>%
mutate(length = ifelse(length != first(length), "Unknown", paste0(length)),
power = ifelse(power != first(power), "Unknown", paste0(power)))
name type area length power
<fct> <dbl> <fct> <chr> <chr>
1 a 1. a 10 10
2 a 1. b 10 100
3 b 2. a 10 10
4 b 2. a 10 Unknown
5 c 3. b 10 10
6 d 4. b 10 100
It checks whether the values are the same as for the first row for a given combination of "name", "type" and "area". If not, it fills the rows with "Unknown".
Upvotes: 1