user6640481
user6640481

Reputation:

Remove duplicate records in dataframe

I have created table with below values in R

Row 1 as ("Cat","Cat","Cow",NA)
Row 2 as ("Cat","Cow","Cat",NA)
Row 3 as ("Cat","Cat",NA,NA)

but I need my final output with all duplicate values in each row removed and also NA values removed output to read as below

Row 1 as ("Cat","Cow");
Row 2 as ("Cat","Cow"),
Row 3 as ("Cat"," " )

Upvotes: 1

Views: 89

Answers (1)

akrun
akrun

Reputation: 886938

We can use apply to loop over the rows (MARGIN = 1), remove the duplicates (!duplicated(x)) and the NA (!is.na(x)), the output can be a list if the number of elements in each of the rows are of different length after the removal. To convert it back to a matrix, we can pad blank values at the end using stri_list2matrix (from stringi).

lst <- apply(df1, 1, FUN = function(x) x[!is.na(x) & !duplicated(x)])
library(stringi)
stri_list2matrix(lst, fill='', byrow=TRUE)
#     [,1]  [,2] 
#[1,] "Cat" "Cow"
#[2,] "Cat" "Cow"
#[3,] "Cat" ""   

Upvotes: 3

Related Questions