SWR
SWR

Reputation: 505

How to remove records in a dataframe

I need to remove specific rows of my dataframe, but I have troubles doing it. The dataset looks like this:

> head(mergedmalefemale)
  coupleid gender shop time amount
1        1      W    3    1  29.05
2        1      W    1    2  31.65
3        1      W    3    3     NA
4        1      W    2    4  17.75
5        1      W    3    5 -28.40
6        2      W    1    1  42.30

What I would like to do is deleting all the records of a coupleid where at least one amount is NA or negative. In the example above, all rows with coupleid "1" should be deleted as there are rows with negative values and NA's. I tried it with functions like na.omit(mergedmalefemale) etc. but this deletes only the rows with NA's but not other rows with the same cupleid. As I am a beginner I'd be happy if someone could help me.

Upvotes: 0

Views: 155

Answers (4)

thelatemail
thelatemail

Reputation: 93813

Another good opportunity to apply data.table

require(data.table)
mergedmalefemale <- as.data.table(mergedmalefemale)
mergedmalefemale[, if(!any(is.na(amount) | amount < 0)) .SD, by=coupleid]

#   coupleid gender shop time amount
#1:        2      W    1    1   42.3

Upvotes: 1

Hugh
Hugh

Reputation: 16089

Here's a fairly dirty way

# identify the coupleids that need to stay/be removed
agg <- aggregate(amount ~ coupleid, data=mergedmalefemale, FUN=function(x) min(is.na(x)|(x>0)))

# insert a column alongside "amount.y" that puts a 0 next to rows to be deleted
df.1 <- merge(mergedmalefemale, agg, by="coupleid")

# delete the rows
df.1 <- df.1[df.1$amount.y == 1, ]

Upvotes: 0

shadow
shadow

Reputation: 22293

Since you do not want to only omit the amounts that are NA or negative, but want to omit all data with the same id, you have to first find the id's you want to remove and then remove them.

mergedmalefemale <- read.table(text="
    coupleid gender shop time amount
    1        1      W    3    1  29.05
    2        1      W    1    2  31.65
    3        1      W    3    3     NA
    4        1      W    2    4  17.75
    5        1      W    3    5 -28.40
    6        2      W    1    1  42.30", 
    header=TRUE)

# Find NA and negative amounts
del <- is.na(mergedmalefemale[,"amount"]) | mergedmalefemale[,"amount"]<0
# Find coupleid with NA or negative amounts
ids <- unique(mergedmalefemale[del,"coupleid"])
# Remove data with coupleid such that amount is NA or negative
mergedmalefemale[!mergedmalefemale[,"coupleid"] %in% ids,]

Upvotes: 2

Jilber Urbina
Jilber Urbina

Reputation: 61154

Here's one alternative. Consider your data.frame is called df

> na.omit(df[ rowSums(df[, sapply(df, is.numeric)]< 0, na.rm=TRUE)  ==0, ])
  coupleid gender shop time amount
1        1      W    3    1  29.05
2        1      W    1    2  31.65
4        1      W    2    4  17.75
6        2      W    1    1  42.30

Upvotes: 1

Related Questions