kmalts
kmalts

Reputation: 21

Merge semi-duplicated rows in R

I have a data frame (called A in example below) that looks like this subset:

Survey HaulNo Year Species Unsexed Males Females HaulUnique Lat_long  
FRGF      1   2000  134567      NA     4      NA     1_2000     50.7_-2.5 
FRGF      1   2000  134567      NA    NA       5     1_2000     50.7_-2.5 
FRGF      2   2003  134578      10    NA      NA     2_2003     49.5_-1.5 
FRGF      3   1998  123557      NA    NA       7     3_1998     50.1_-0.5
FRGF      3   1998  123557      NA     3      NA     3_1998     50.1_-0.5 

I would like to merge these rows so that they look like the data below:

Survey HaulNo Year Species Unsexed Males Females HaulUnique Lat_long  
FRGF      1   2000  134567      NA     4      5     1_2000     50.7_-2.5 
FRGF      2   2003  134578      10    NA      NA    2_2003     49.5_-1.5 
FRGF      3   1998  123557      NA     3      7     3_1998     50.1_-0.5 

Essentially, I want to merge rows so that the information lying within the columns "Unsexed", "Males" and "Females" is all within one row, rather than in the current situation where this data is split and duplicate information occurs on multiple rows regarding same species and haul etc. It is essential that when I merge the rows everything else is maintained and kept unique, as each row (once merged) represents a unique haul.

I don't want to apply any sort of sum/mean/other function to these 3 columns and I want keep all my other variables the same. I also do not want to create any additional new columns and would like to keep the NAs if possible.

NB. Given I have a huge dataset, I am not always aware a) which rows are semi-duplicated and b) which combination of Unsexed/Males/Females each row has information on.

I have tried a variety of ways to do this none of which I've got to work, partly because as a beginner in R I have struggled to really understand the functions I've tried and apply them to my data (aggregate, ddply, cast).

Thanks in advance.

Upvotes: 2

Views: 2282

Answers (1)

Thierry
Thierry

Reputation: 18487

Something like

aggregate(
  df[, c("Unsexed", "Males", "Females")],
  df[, c("Survey", "HaulNo", "Year", "Species", "HaulUnique", "Lat_long")],
  FUN = sum,
  na.rm = TRUE
)

Upvotes: 2

Related Questions