Heliornis
Heliornis

Reputation: 391

List of lists by unique coordinates

I have a data frame like the one below. I want to collapse it, though, so that each unique coordinate is a list of its SubIDs.

       subID                  latlon
1  S20298920 29.2178694, -94.9342990
2  S35629295 26.7063982, -80.7168961
3  S35844314 26.7063982, -80.7168961
4  S35833936 26.6836236, -80.3512144
7  S30634757 42.4585456, -76.5146989
8  S35834082 26.4330582, -80.9416786
9  S35857972 26.4330582, -80.9416786
10 S35833885 26.7063982, -80.7168961

So, here, I want (26.7063982, -80.7168961) to be a list containing (S35629295, S35844314), and (29.2178694, -94.9342990) to be a list containing just (S20298920). I think a list of lists is what makes most sense.

Upvotes: 1

Views: 84

Answers (3)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521864

Use aggregate:

out <- aggregate(data=df,subID~latlon,FUN = function(t) list(sort(paste(t))))

Since your data set is large and cumbersome, the sample code below uses watered down data which is easier to read.

out <- aggregate(data=df,name~ID,FUN = function(t) list(sort(paste(t))))
out
  ID          name
1  1 apple, orange
2  2        orange
3  3 apple, orange

Data:

df <- data.frame(ID=c(1,1,2,3,3),
                 name=c('apple', 'orange', 'orange', 'orange', 'apple'))

Demo

Upvotes: 1

alistaire
alistaire

Reputation: 43362

In the tidyverse, you can either use tidyr::nest, which will nest data frames:

library(tidyverse)

df <- data_frame(subID = c("S20298920", "S35629295", "S35844314", "S35833936", "S30634757", "S35834082", "S35857972", "S35833885"), 
                 latlon = c("29.2178694, -94.934299", "26.7063982, -80.7168961", "26.7063982, -80.7168961", "26.6836236, -80.3512144", "42.4585456, -76.5146989", "26.4330582, -80.9416786", "26.4330582, -80.9416786", "26.7063982, -80.7168961"))

df %>% nest(subID)
#> # A tibble: 5 x 2
#>                    latlon             data
#>                     <chr>           <list>
#> 1  29.2178694, -94.934299 <tibble [1 x 1]>
#> 2 26.7063982, -80.7168961 <tibble [3 x 1]>
#> 3 26.6836236, -80.3512144 <tibble [1 x 1]>
#> 4 42.4585456, -76.5146989 <tibble [1 x 1]>
#> 5 26.4330582, -80.9416786 <tibble [2 x 1]>

or just summarize with list to make a list column of vectors:

df %>% 
    group_by(latlon) %>% 
    summarise_all(list)
#> # A tibble: 5 x 2
#>                    latlon     subID
#>                     <chr>    <list>
#> 1 26.4330582, -80.9416786 <chr [2]>
#> 2 26.6836236, -80.3512144 <chr [1]>
#> 3 26.7063982, -80.7168961 <chr [3]>
#> 4  29.2178694, -94.934299 <chr [1]>
#> 5 42.4585456, -76.5146989 <chr [1]>

Upvotes: 0

Onyambu
Onyambu

Reputation: 79288

   with(data,tapply(subID,latlon,as.list))

output:

$`26.4330582 -80.9416786`
$`26.4330582 -80.9416786`[[1]]
[1] "S35834082"

$`26.4330582 -80.9416786`[[2]]
[1] "S35857972"


$`26.6836236 -80.3512144`
$`26.6836236 -80.3512144`[[1]]
[1] "S35833936"
   :
   :
   :

data:

 data=read.table(text="subID latlon
 S20298920 '29.2178694 -94.9342990'
 S35629295 '26.7063982 -80.7168961'
 S35844314 '26.7063982 -80.7168961'
 S35833936 '26.6836236 -80.3512144'
 S30634757 '42.4585456 -76.5146989'
 S35834082 '26.4330582 -80.9416786'
 S35857972 '26.4330582 -80.9416786'
 S35833885 '26.7063982 -80.7168961' ",h=T,stringsAsFactors=F)

Upvotes: 0

Related Questions