Reputation: 21
I have a data frame (called A in example below) that looks like this subset:
Survey HaulNo Year Species Unsexed Males Females HaulUnique Lat_long
FRGF 1 2000 134567 NA 4 NA 1_2000 50.7_-2.5
FRGF 1 2000 134567 NA NA 5 1_2000 50.7_-2.5
FRGF 2 2003 134578 10 NA NA 2_2003 49.5_-1.5
FRGF 3 1998 123557 NA NA 7 3_1998 50.1_-0.5
FRGF 3 1998 123557 NA 3 NA 3_1998 50.1_-0.5
I would like to merge these rows so that they look like the data below:
Survey HaulNo Year Species Unsexed Males Females HaulUnique Lat_long
FRGF 1 2000 134567 NA 4 5 1_2000 50.7_-2.5
FRGF 2 2003 134578 10 NA NA 2_2003 49.5_-1.5
FRGF 3 1998 123557 NA 3 7 3_1998 50.1_-0.5
Essentially, I want to merge rows so that the information lying within the columns "Unsexed", "Males" and "Females" is all within one row, rather than in the current situation where this data is split and duplicate information occurs on multiple rows regarding same species and haul etc. It is essential that when I merge the rows everything else is maintained and kept unique, as each row (once merged) represents a unique haul.
I don't want to apply any sort of sum/mean/other function to these 3 columns and I want keep all my other variables the same. I also do not want to create any additional new columns and would like to keep the NAs if possible.
NB. Given I have a huge dataset, I am not always aware a) which rows are semi-duplicated and b) which combination of Unsexed/Males/Females each row has information on.
I have tried a variety of ways to do this none of which I've got to work, partly because as a beginner in R I have struggled to really understand the functions I've tried and apply them to my data (aggregate, ddply, cast).
Thanks in advance.
Upvotes: 2
Views: 2282
Reputation: 18487
Something like
aggregate(
df[, c("Unsexed", "Males", "Females")],
df[, c("Survey", "HaulNo", "Year", "Species", "HaulUnique", "Lat_long")],
FUN = sum,
na.rm = TRUE
)
Upvotes: 2