Reputation: 97
I have a dataframe with 500k rows and about 130 columns. I want to filter out duplicate rows for all columns except one (column 128). I tried:
df <- unique(df[,-128])
df <- df[!duplicated(df[, -128])]
df <- distinct(df, -column128)
I get the same error over and over again:
Error in paste(..............., : formal argument "sep" matched by multiple actual arguments
I also tried to type every single column out, but got the same error. If I try the above for the first 9 columns, the error doesn't appear. However, if I try the same for 10 columns, I get the error. Is there a limit on the number of columns for removing duplicated rows? Or has anyone a solution?
The df looks as follows (column 128 = label):
data.frame': 571262 obs. of 139 variables:
$ x : num 1 1 1 1 0 0 0 7 7 7 ...
$ jan : num 0 0 0 0 0 0 0 0 0 0 ...
$ feb : num 0 0 0 0 0 0 0 0 0 0 ...
$ mrt : num 0 0 0 0 0 0 0 0 0 0 ...
$ apr : num 0 0 0 0 0 0 0 0 0 0 ...
$ mei : num 0 0 0 0 0 0 0 0 0 0 ...
$ jun : num 0 0 0 0 0 0 0 0 0 0 ...
$ jul : num 0 0 0 0 0 0 0 0 0 0 ...
$ aug : num 1 1 0 0 0 0 0 0 0 0 ...
$ sep : num 0 0 1 1 0 0 0 0 0 0 ...
$ okt : num 0 0 0 0 1 1 1 0 0 0 ...
$ nov : num 0 0 0 0 0 0 0 1 1 1 ...
$ dec : num 0 0 0 0 0 0 0 0 0 0 ...
$ - 1 : num 0 0 1 1 1 ...
$ - 2 : num 0 0 0 0 1 ...
$ - 3 : num 0 0 0 0 0 ...
$ - 4 : num 0 0 0 0 0 0 0 0 0 0 ...
......
$ - 114 : num 0 0 0 0 0 0 0 0 0 0 ...
$ label : int 8 12 8 12 8 10 12 8 10 12 ...
$ 2008 : num 0 0 0 0 0 0 0 0 0 0 ...
$ 2009 : num 0 0 0 0 0 0 0 0 0 0 ...
$ 2010 : num 0 0 0 0 0 0 0 0 0 0 ...
$ 2011 : num 0 0 0 0 0 0 0 0 0 0 ...
$ 2012 : num 0 0 0 0 0 0 0 0 0 0 ...
$ 2013 : num 0 0 0 0 0 0 0 0 0 0 ...
$ 2014 : num 0 0 0 0 0 0 0 0 0 0 ...
$ 2015 : num 0 0 0 0 0 0 0 0 0 0 ...
$ 2016 : num 1 1 1 1 1 1 1 1 1 1 ...
$ 2017 : num 0 0 0 0 0 0 0 0 0 0 ...
$ 2018 : num 0 0 0 0 0 0 0 0 0 0 ...
Upvotes: 0
Views: 1171
Reputation: 33940
Seems like one of your month columns 'sep' is colliding with the argument paste(..., sep)
. The error is telling you 'formal argument "sep" matched by multiple actual arguments'.
Unlikely you have 2+ columns called 'sep' : check which(names(df)=='sep')
Workaround is to rename your column 'sep' to something else, e.g. 'spt'
Upvotes: 1
Reputation: 17648
You can try a tidyverse
solution using the filter_at
function.
library(tidyverse)
set.seed(14)
df <- data.frame(a=sample(1:4, 5, T),b=sample(1:4, 5, T), d=1:5)
df
a b d
1 2 3 1
2 3 4 2
3 4 2 3
4 3 2 4
5 4 2 5
df %>% filter_at(vars(-3), all_vars(!duplicated(.)))
a b d
1 2 3 1
2 3 4 2
3 4 2 3
Upvotes: 0