Reputation: 75
I've got a dataset with yearly data that looks like this:
year site_ID latitude longitude
2000 t2 61.5 -112.0667
2001 t2 61.5 -112.0667
2000 t3 21.2 -110.5
2001 t3 21.2 -110.5
2000 t4 61.5 -112.0667
2001 t4 61.5 -112.0667
As you can see, sites t2
and t4
have the same coordinates. In cases such as these, I want to modify the site_ID
to include the ID of the site that has the same latitude and longitude. So, something like this:
year site_ID latitude longitude
2000 t2_t4 61.5 -112.0667
2001 t2_t4 61.5 -112.0667
2000 t3 21.2 -110.5
2001 t3 21.2 -110.5
2000 t2_t4 61.5 -112.0667
2001 t2_t4 61.5 -112.0667
The names have to be identical (so t2_t4
and NOT t4_t2
) because later I will be aggregating such sites by the site_ID
. Also, in some cases, there will be more than two duplicates. In these cases, I want to follow the same rule just with three ID's, e.g. t2_t4_t5
.
Upvotes: 1
Views: 27
Reputation: 887223
An option with data.table
library(data.table)
setDT(df)[, site_ID := paste(site_ID, collapse = "_"), .(year, latitude, longitude)]
Upvotes: 1
Reputation: 39667
You can use in ave
, unique
and paste
:
x$site_ID <- ave(x$site_ID, x$latitude, x$longitude, FUN=function(y)
paste(unique(y), collapse = "_"))
x
# year site_ID latitude longitude
#1 2000 t2_t4 61.5 -112.0667
#2 2001 t2_t4 61.5 -112.0667
#3 2000 t3 21.2 -110.5000
#4 2001 t3 21.2 -110.5000
#5 2000 t2_t4 61.5 -112.0667
#6 2001 t2_t4 61.5 -112.0667
Data:
x <- read.table(header=TRUE, text="year site_ID latitude longitude
2000 t2 61.5 -112.0667
2001 t2 61.5 -112.0667
2000 t3 21.2 -110.5
2001 t3 21.2 -110.5
2000 t4 61.5 -112.0667
2001 t4 61.5 -112.0667")
Upvotes: 3
Reputation: 39605
Try this grouping by variables and then using mutate()
with paste0()
:
library(dplyr)
#Code
new <- df %>% group_by(year,latitude,longitude) %>%
mutate(site_ID=paste0(site_ID,collapse = '_'))
Output:
# A tibble: 6 x 4
# Groups: year, latitude, longitude [4]
year site_ID latitude longitude
<int> <chr> <dbl> <dbl>
1 2000 t2_t4 61.5 -112.
2 2001 t2_t4 61.5 -112.
3 2000 t3 21.2 -110.
4 2001 t3 21.2 -110.
5 2000 t2_t4 61.5 -112.
6 2001 t2_t4 61.5 -112.
Upvotes: 1