Ally D
Ally D

Reputation: 165

merge rows with conditional in if statement

I have a data frame that looks like:

df <- data.frame ("ID"=c(a, b, c, c, d, e, e), "K"= c(1, 1, 1, 1, 1, 1, 1), 
                       "P"=c(1, 1, 1, 1, 2, 2, 1), "C"=c(1, 2, 1, 2, 1, 2, 1))
  ID K P C
   a 1 1 1
   b 1 1 2
   c 1 1 1
   c 1 1 2
   d 1 2 1
   e 1 2 2
   e 1 1 1

I'm trying to collapse rows with the same ID value.

For rows that have no duplicated ID value I want that row to remain untouched.

For rows with a duplicated ID value (c & e) I want it to collapse the rows. For each of the columns (K, P, C) I want it to look at the values of each of the duplicated rows. If both values in the K column = 1 then the output should remain a 1. If both the values in the K column don't equal 1, then the output should be a -99. Same for column P and column C.

So ideally the output would look like:

  ID K  P   C
1  a 1  1   1
2  b 1  1   2
3  c 1  1  -99
5  d 1  2   1
6  e 1 -99 -99

So far I've only gotten to :

for (row in 1:nrow(df) ) {
  if length(df$ID) >=2



 }

But I'm pretty stuck because I don't even know if this is the correct way to begin.

Thanks!

Upvotes: 1

Views: 2209

Answers (2)

Peter Harrison
Peter Harrison

Reputation: 912

The plyr package is useful here.

 Original version

df <- data.frame ("ID"=c(1, 2, 3, 3, 4, 5, 5), "K"= c(1, 1, 1, 1, 1, 1, 1), 
                  "P"=c(1, 1, 1, 1, 2, 2, 1), "C"=c(1, 2, 1, 2, 1, 2, 1))
library(plyr)
ddply(df, .variables = .(ID),
      summarize,
      K = if (length(K) == 1) K else if (all(K == 1)) 1 else -99,
      P = if (length(P) == 1) P else if (all(P == 1)) 1 else -99,
      C = if (length(C) == 1) C else if (all(C == 1)) 1 else -99)

 Version that preserves additional columns

Note that this code assumes that your additional columns have identical values for all rows with the same ID.

df <- data.frame ("ID"=c(1, 2, 3, 3, 4, 5, 5), "K"= c(1, 1, 1, 1, 1, 1, 1), 
                  "P"=c(1, 1, 1, 1, 2, 2, 1), "C"=c(1, 2, 1, 2, 1, 2, 1),
                  "extra1" = c("a", "b", "c", "c", "d", "e", "e"),
                  "extra2" = c("A", "B", "C", "C", "D", "E", "E"))
library(plyr)
df1 <- ddply(df, .variables = .(ID),
             summarize,
             K = if (length(K) == 1) K else if (all(K == 1)) 1 else -99,
             P = if (length(P) == 1) P else if (all(P == 1)) 1 else -99,
             C = if (length(C) == 1) C else if (all(C == 1)) 1 else -99)
df2 <- merge(df1, unique(df[, setdiff(names(df), c("K", "P", "C"))]),
             by = "ID")

Upvotes: 1

bringtheheat
bringtheheat

Reputation: 90

here's a different approach using baseR:

df <- data.frame ("ID"=c(1, 2, 3, 3, 4, 5, 5), "K"= c(1, 1, 1, 1, 1, 1, 1), 
                  "P"=c(1, 1, 1, 1, 2, 2, 1), "C"=c(1, 2, 1, 2, 1, 2, 1))

dupl = with(df, duplicated(ID, K, P))
df$c = with(df, ifelse(dupl == TRUE & K == 1, -99, C)

Upvotes: 0

Related Questions