Reputation: 15
I have a large dataset that looks like this:
structure(list(species = structure(c(1L, 2L, 3L, 4L, 2L, 3L), .Label = c("Aa achalensis",
"Aa argyrolepis", "Aa aurantiaca", "Aa calceata"), class = "factor"),
establishment = structure(c(1L, 1L, 1L, 1L, 2L, 2L), .Label = c("farm",
"field"), class = "factor"), region = structure(c(1L, 1L,
1L, 1L, 1L, 1L), .Label = "ABT", class = "factor"), leaf.area = c(0.348046463,
0.265755867, 0.382584479, 0.336147631, NA, NA), ses = c(-0.3906,
-0.6257, -0.2987, -0.423, NA, NA)), row.names = c(NA, 6L), class = "data.frame")
I would like to eliminate duplicate observations by using that strategy: change the establishment farm by field in the species that are duplicated, keeping all columns from the farm location (complete information), and eliminating the field one (only NA's in the last 2 columns).
I am pretty new in coding and maybe one of you could help me with it.
Upvotes: 0
Views: 33
Reputation: 389055
You can try this approach -
library(dplyr)
df %>%
group_by(species) %>%
mutate(establishment = if(n() > 1) 'field' else establishment) %>%
filter(!is.na(leaf.area) & !is.na(ses)) %>%
ungroup
# species establishment region leaf.area ses
# <fct> <chr> <fct> <dbl> <dbl>
#1 Aa achalensis farm ABT 0.348 -0.391
#2 Aa argyrolepis field ABT 0.266 -0.626
#3 Aa aurantiaca field ABT 0.383 -0.299
#4 Aa calceata farm ABT 0.336 -0.423
This replaces establishment
value to 'field'
if there are duplicates for the species
and keep non-NA rows.
Upvotes: 1