Reputation: 497
This is a short example of the dataframe that I am trying to clean:
L3 <- LETTERS[1:5]
fac<-c("fish", "meat", "chicken", "veg", "shrimp")
set.seed(1)
(d <- data.frame(code = sample(c(11:15)),
upc = sample(c(1:5)), desc = sample(fac),
desc1 = fac, desc2 = sample(fac),
desc3 = fac, desc4 = sample(fac) ))
code upc desc desc1 desc2 desc3 desc4
1 12 5 meat fish chicken fish shrimp
2 15 4 fish meat shrimp meat fish
3 14 2 chicken chicken veg chicken meat
4 13 3 veg veg fish veg veg
5 11 1 shrimp shrimp meat shrimp chicken
I am trying to write a general function (using a for loop
and unique()
) that verifies the entries from column 3 to 7 independently for each row and that keeps a unique value that is not repeated in the other columns (i.e. : if a row contains fish in all desc columns the new row should only contain fish in one column). More specifically, the desired outcome is:
code upc desc desc1 desc2 desc3 desc4
1 12 5 meat fish chicken shrimp
2 15 4 fish meat shrimp
3 14 2 chicken veg meat
4 13 3 veg fish
5 11 1 shrimp meat chicken
Upvotes: 1
Views: 38
Reputation: 887641
We can use duplicated
to assign those elements that are duplicates in each row to blank ""
for the 'desc' columns
nm1 <- grep('desc', names(d))
d[nm1] <- t(apply(d[nm1], 1, function(x) {replace(x, duplicated(x), "")}))
d
# code upc desc desc1 desc2 desc3 desc4
#1 12 5 meat fish chicken shrimp
#2 15 4 fish meat shrimp
#3 14 2 chicken veg meat
#4 13 3 veg fish
#5 11 1 shrimp meat chicken
Or using a for
loop (assuming the columns are character
class or have blank as one of the levels before doing the assignment)
for(i in seq_len(nrow(d))) d[i, nm1] <- replace(d[i, nm1],
duplicated(unlist(d[i, nm1])), '')
Upvotes: 2