Reputation: 13
I have data with over 6k columns. Each result has colums with data that are always the same.
XCODE Age Sex ResultA Sex ResultB
1 X001 12 2 2 2 4
2 X002 23 2 4 2 66
3 X003 NA NA NA NA NA
4 X004 32 1 1 1 3
5 X005 NA NA NA NA NA
6 X001 NA NA NA NA NA
7 X002 NA NA NA NA NA
8 X003 33 1 8 1 6
9 X004 NA NA NA NA NA
10 X005 55 2 8 2 8
I would like to remove duplicate e.g sex variable. Is there possibility of doing that with data.table?
Upvotes: 1
Views: 44
Reputation: 28675
You can use match
if you need to check for equality of all values.
df[, unique(match(df, df)), with = F]
df2
# XCODE Age Sex ResultA ResultB
# 1 X001 12 2 2 4
# 2 X002 23 2 4 66
# 3 X003 NA NA NA NA
# 4 X004 32 1 1 3
# 5 X005 NA NA NA NA
# 6 X001 NA NA NA NA
# 7 X002 NA NA NA NA
# 8 X003 33 1 8 6
# 9 X004 NA NA NA NA
# 10 X005 55 2 8 8
Data used:
df <- fread('
XCODE Age Sex ResultA Sex ResultB
1 X001 12 2 2 2 4
2 X002 23 2 4 2 66
3 X003 NA NA NA NA NA
4 X004 32 1 1 1 3
5 X005 NA NA NA NA NA
6 X001 NA NA NA NA NA
7 X002 NA NA NA NA NA
8 X003 33 1 8 1 6
9 X004 NA NA NA NA NA
10 X005 55 2 8 2 8
')[, -'V1']
Upvotes: 2
Reputation: 639
If you have duplicated columns with different names, you can transpose your dataframe, which allows you to use the unique
function to solve your problem. Then you then transpose it back and set it back to dataframe (because it came a matrix when you transposed it).
df = data.frame(c = 1:5, a = c("A", "B","C","D","E"), b = 1:5)
df = t(df)
df = unique(df)
df = t(df)
df = data.frame(df)
Edit: like markus points out, this is probably not a good option if you have columns of multiples types because when t() coerces your dataframe to matrix it also coerces all your variables into the same type.
Upvotes: 1
Reputation: 4480
Try this:
df[, unique(colnames(df))]
One caveat: it will delete all columns with duplicated names. In your case, it will delete Sex
even if the two columns have the same name but different content.
Upvotes: 1