salix_august
salix_august

Reputation: 75

Alter strings in a column based on duplicated data in other columns

I've got a dataset with yearly data that looks like this:

 year site_ID latitude longitude 
 2000   t2     61.5    -112.0667       
 2001   t2     61.5    -112.0667     
 2000   t3     21.2    -110.5
 2001   t3     21.2    -110.5
 2000   t4     61.5    -112.0667
 2001   t4     61.5    -112.0667

As you can see, sites t2 and t4 have the same coordinates. In cases such as these, I want to modify the site_ID to include the ID of the site that has the same latitude and longitude. So, something like this:

 year site_ID latitude longitude 
 2000   t2_t4     61.5    -112.0667       
 2001   t2_t4     61.5    -112.0667     
 2000   t3        21.2    -110.5
 2001   t3        21.2    -110.5
 2000   t2_t4     61.5    -112.0667
 2001   t2_t4     61.5    -112.0667

The names have to be identical (so t2_t4 and NOT t4_t2) because later I will be aggregating such sites by the site_ID. Also, in some cases, there will be more than two duplicates. In these cases, I want to follow the same rule just with three ID's, e.g. t2_t4_t5.

Upvotes: 1

Views: 27

Answers (3)

akrun
akrun

Reputation: 887223

An option with data.table

library(data.table)
setDT(df)[, site_ID := paste(site_ID, collapse = "_"), .(year, latitude, longitude)]

Upvotes: 1

GKi
GKi

Reputation: 39667

You can use in ave, unique and paste:

x$site_ID <- ave(x$site_ID, x$latitude, x$longitude, FUN=function(y)
 paste(unique(y), collapse = "_"))
x
#  year site_ID latitude longitude
#1 2000   t2_t4     61.5 -112.0667
#2 2001   t2_t4     61.5 -112.0667
#3 2000      t3     21.2 -110.5000
#4 2001      t3     21.2 -110.5000
#5 2000   t2_t4     61.5 -112.0667
#6 2001   t2_t4     61.5 -112.0667

Data:

x <- read.table(header=TRUE, text="year site_ID latitude longitude 
 2000   t2     61.5    -112.0667       
 2001   t2     61.5    -112.0667     
 2000   t3     21.2    -110.5
 2001   t3     21.2    -110.5
 2000   t4     61.5    -112.0667
 2001   t4     61.5    -112.0667")

Upvotes: 3

Duck
Duck

Reputation: 39605

Try this grouping by variables and then using mutate() with paste0():

library(dplyr)
#Code
new <- df %>% group_by(year,latitude,longitude) %>%
  mutate(site_ID=paste0(site_ID,collapse = '_'))

Output:

# A tibble: 6 x 4
# Groups:   year, latitude, longitude [4]
   year site_ID latitude longitude
  <int> <chr>      <dbl>     <dbl>
1  2000 t2_t4       61.5     -112.
2  2001 t2_t4       61.5     -112.
3  2000 t3          21.2     -110.
4  2001 t3          21.2     -110.
5  2000 t2_t4       61.5     -112.
6  2001 t2_t4       61.5     -112.

Upvotes: 1

Related Questions