shregam
shregam

Reputation: 13

Selecting a specific row with duplicate values in a column but a blank value in one of the columns of the duplicate rows in R

I'm new to stack overflow. I'm sorry if the title is not comprehensible. I'll try to do a better job next time.

I have a data frame with duplicate values in the first column but the last column in their respective rows rows differ(few are blank and other might have some value in them). I have been trying to eliminate the duplicate with blank field in last column but every time I use 'duplicated()' or 'unique()' function, I'm only left with the row with blank field. This is possibly happening because the rows are sorted in such a way that the blank field is above the row which has a value in last column.

For example, Input:

 *No.   per         hmm        qty*
  1    BARBIE123     new         1
  2    AYLIK32     * N/A*        1
  3    AYLIK32       new         1
  4     BARB         mid         1

Output I keep getting:

*No.   per      hmm        qty*
1    BARBIE    new         1
2    AYLIK   * N/A*        1
4     BARB     mid         1

Desired output:

*No.   per      hmm        qty*
1     BARBIE    new         1
3     AYLIK     new         1
4     BARB      mid         1

Is there a way I could specify which row to keep and which row to remove?

Appreciate all the help. Kindly let me know if the question is not comprehensible. Thanks in advance.

Upvotes: 1

Views: 140

Answers (2)

CPak
CPak

Reputation: 13581

Maybe this will work for you

df <- read.table(text="No.   per         hmm        qty*
1    BARBIE123     new         1
2    AYLIK32       N/A        1
3    AYLIK32       new         1
4     BARB         mid         1", header=TRUE, stringsAsFactors=FALSE)

library(dplyr)
df %>%
  filter(!duplicated(per, fromLast=TRUE))

  # No.       per hmm qty.
# 1   1 BARBIE123 new    1
# 2   3   AYLIK32 new    1
# 3   4      BARB mid    1

Upvotes: 0

leeum
leeum

Reputation: 274

Hm, unsure if this is what you want. This will remove all rows with a NA.

df <- df[complete.cases(df), ]

Lets say you have NAs in other columns and you dont want those taken into account. To only look at NAs in a single column, use this:

df <- df[complete.cases(df[,3]), ]

Where 3 is the # of the column you want to look for NAs in.

Upvotes: 1

Related Questions