Reputation: 193
I can't figure out what is the difference between these two functions in R. I have a data.frame, and I want to remove rows corresponding to duplicated values in a given column;
Acc Probe Coord_homol
1 NR_004442.1 225541_at~122 391
2 NM_028059.2 241348_at~444 4642
3 NM_028059.2 241348_at~468 4666
4 NM_001114 212306_at~4357 5034
5 NM_010573.2 230472_at~402 1987
6 NM_029633.2 212306_at~4357 4289
7 NM_00108196 212306_at~4357 4292
8 NM_029891.2 205004_at~3421 2963
9 NM_029891.2 205004_at~3635 3173
10 NM_007892.2 221586_s_at~1356 1257
11 NR_036613.1 208672_s_at~829 1301
12 NR_036613.1 208673_s_at~1472 1854
13 NM_011078.3 212726_at~3872 5175
14 NM_011078.3 212726_at~3887 5190
15 NM_013915.3 207164_s_at~1523 2911
in this case, I would like to remove rows 7 because the probe is the same as in row 6 (rows with same probes do not have to be successive ones).
I first tried unique(), and later found duplicated. but if the following command
dat[!duplicated(dat$probe),]
dat[unique(dat$probe),]
give the same number of lines in the resulting data.frame, the results are not the same.
I tried on a much simpler case, like the following:
a simple data.frame:
> dat
probe val
1 aaa 10
2 bbb 12
3 ccc 45
4 ddd 32
5 aaa 42
6 eee 10
7 fff 13
8 ccc 85
9 aaa 75
10 ddd 64
using !duplicated(): it seems to be what I want to do;
dat[!duplicated(dat$probe),]
probe val
1 aaa 10
2 bbb 12
3 ccc 45
4 ddd 32
6 eee 10
7 fff 13
using unique():
dat[unique(dat$probe),]
I get:
probe val
1 aaa 10
2 bbb 12
3 ccc 45
4 ddd 32
5 aaa 42
6 eee 10
Not what I want;
But what exactly unique() is doing ?
Thanks for your help.
Upvotes: 1
Views: 4442
Reputation: 10478
unique
is returning a factor and the numeric levels of the factor are being used for indexing rather than the labels.
uni <- unique(dat$probe)
str(uni)
Factor w/ 6 levels "aaa","bbb","ccc",..: 1 2 3 4 5 6
It is like you are doing this:
nums <- as.numeric(unique(dat$probe))
dat[nums,]
probe val
1 aaa 10
2 bbb 12
3 ccc 45
4 ddd 32
5 aaa 42
6 eee 10
unique
is returning a factor because we are putting a factor into it in this case. It doesn't always return factors. For example, unique(as.character(dat$probe))
would return characters.
Upvotes: 3