Reputation: 391
I have a data frame like the one below. I want to collapse it, though, so that each unique coordinate is a list of its SubIDs.
subID latlon
1 S20298920 29.2178694, -94.9342990
2 S35629295 26.7063982, -80.7168961
3 S35844314 26.7063982, -80.7168961
4 S35833936 26.6836236, -80.3512144
7 S30634757 42.4585456, -76.5146989
8 S35834082 26.4330582, -80.9416786
9 S35857972 26.4330582, -80.9416786
10 S35833885 26.7063982, -80.7168961
So, here, I want (26.7063982, -80.7168961) to be a list containing (S35629295, S35844314), and (29.2178694, -94.9342990) to be a list containing just (S20298920). I think a list of lists is what makes most sense.
Upvotes: 1
Views: 84
Reputation: 521864
Use aggregate
:
out <- aggregate(data=df,subID~latlon,FUN = function(t) list(sort(paste(t))))
Since your data set is large and cumbersome, the sample code below uses watered down data which is easier to read.
out <- aggregate(data=df,name~ID,FUN = function(t) list(sort(paste(t))))
out
ID name
1 1 apple, orange
2 2 orange
3 3 apple, orange
Data:
df <- data.frame(ID=c(1,1,2,3,3),
name=c('apple', 'orange', 'orange', 'orange', 'apple'))
Upvotes: 1
Reputation: 43362
In the tidyverse, you can either use tidyr::nest
, which will nest data frames:
library(tidyverse)
df <- data_frame(subID = c("S20298920", "S35629295", "S35844314", "S35833936", "S30634757", "S35834082", "S35857972", "S35833885"),
latlon = c("29.2178694, -94.934299", "26.7063982, -80.7168961", "26.7063982, -80.7168961", "26.6836236, -80.3512144", "42.4585456, -76.5146989", "26.4330582, -80.9416786", "26.4330582, -80.9416786", "26.7063982, -80.7168961"))
df %>% nest(subID)
#> # A tibble: 5 x 2
#> latlon data
#> <chr> <list>
#> 1 29.2178694, -94.934299 <tibble [1 x 1]>
#> 2 26.7063982, -80.7168961 <tibble [3 x 1]>
#> 3 26.6836236, -80.3512144 <tibble [1 x 1]>
#> 4 42.4585456, -76.5146989 <tibble [1 x 1]>
#> 5 26.4330582, -80.9416786 <tibble [2 x 1]>
or just summarize with list
to make a list column of vectors:
df %>%
group_by(latlon) %>%
summarise_all(list)
#> # A tibble: 5 x 2
#> latlon subID
#> <chr> <list>
#> 1 26.4330582, -80.9416786 <chr [2]>
#> 2 26.6836236, -80.3512144 <chr [1]>
#> 3 26.7063982, -80.7168961 <chr [3]>
#> 4 29.2178694, -94.934299 <chr [1]>
#> 5 42.4585456, -76.5146989 <chr [1]>
Upvotes: 0
Reputation: 79288
with(data,tapply(subID,latlon,as.list))
output:
$`26.4330582 -80.9416786`
$`26.4330582 -80.9416786`[[1]]
[1] "S35834082"
$`26.4330582 -80.9416786`[[2]]
[1] "S35857972"
$`26.6836236 -80.3512144`
$`26.6836236 -80.3512144`[[1]]
[1] "S35833936"
:
:
:
data:
data=read.table(text="subID latlon
S20298920 '29.2178694 -94.9342990'
S35629295 '26.7063982 -80.7168961'
S35844314 '26.7063982 -80.7168961'
S35833936 '26.6836236 -80.3512144'
S30634757 '42.4585456 -76.5146989'
S35834082 '26.4330582 -80.9416786'
S35857972 '26.4330582 -80.9416786'
S35833885 '26.7063982 -80.7168961' ",h=T,stringsAsFactors=F)
Upvotes: 0