Reputation: 13
I'm new to stack overflow. I'm sorry if the title is not comprehensible. I'll try to do a better job next time.
I have a data frame with duplicate values in the first column but the last column in their respective rows rows differ(few are blank and other might have some value in them). I have been trying to eliminate the duplicate with blank field in last column but every time I use 'duplicated()' or 'unique()' function, I'm only left with the row with blank field. This is possibly happening because the rows are sorted in such a way that the blank field is above the row which has a value in last column.
For example, Input:
*No. per hmm qty*
1 BARBIE123 new 1
2 AYLIK32 * N/A* 1
3 AYLIK32 new 1
4 BARB mid 1
Output I keep getting:
*No. per hmm qty*
1 BARBIE new 1
2 AYLIK * N/A* 1
4 BARB mid 1
Desired output:
*No. per hmm qty*
1 BARBIE new 1
3 AYLIK new 1
4 BARB mid 1
Is there a way I could specify which row to keep and which row to remove?
Appreciate all the help. Kindly let me know if the question is not comprehensible. Thanks in advance.
Upvotes: 1
Views: 140
Reputation: 13581
Maybe this will work for you
df <- read.table(text="No. per hmm qty*
1 BARBIE123 new 1
2 AYLIK32 N/A 1
3 AYLIK32 new 1
4 BARB mid 1", header=TRUE, stringsAsFactors=FALSE)
library(dplyr)
df %>%
filter(!duplicated(per, fromLast=TRUE))
# No. per hmm qty.
# 1 1 BARBIE123 new 1
# 2 3 AYLIK32 new 1
# 3 4 BARB mid 1
Upvotes: 0
Reputation: 274
Hm, unsure if this is what you want. This will remove all rows with a NA.
df <- df[complete.cases(df), ]
Lets say you have NAs in other columns and you dont want those taken into account. To only look at NAs in a single column, use this:
df <- df[complete.cases(df[,3]), ]
Where 3 is the # of the column you want to look for NAs in.
Upvotes: 1