Reputation: 543
I have a network dataset of adolescent friendships over 7 waves. I'm trying to get the length of a given dyad (directed friendship).
SAMPLE HAVE DATA:
ego alter wave
1 5 1
1 4 1
1 5 2
1 2 2
1 3 2
2 8 1
2 8 2
2 8 3
3 4 1
3 7 1
3 6 1
3 6 2
3 7 3
3 6 3
WANT DATA:
ego alter friendship_length
1 5 2
1 4 1
1 2 1
1 3 1
2 8 3
3 4 1
3 7 1
3 6 3
Here's what I've already tried:
edges_wide <- edges_long %>%
select(ego, alter, wave) %>%
group_by(ego, alter) %>%
mutate(col=seq_along(ego))%>% # add a column indicator
spread(key=col, value=wave)
Which gives me this:
ego alter col3 col4 col5
1 5 1 2 NA
1 4 1 NA NA
1 2 2 NA NA
1 3 2 NA NA
2 8 1 2 3
3 4 1 NA NA
3 7 1 3 NA
3 6 1 2 3
From here I'm not sure how to get the wave span (length) of the directed friendship, including not counting non consecutive nominations (like ego 3 alter 7).
Upvotes: 1
Views: 91
Reputation: 2435
It should be possible to have a shorter solution.
If I understand correctly, you want to count only the first occurrences of subsequent waves in which alter and ego have a relationship. Therefore, we can add a group id with row_number()
, adjust for the fact that sometimes waves start after 1 with min(wave)-1
, and then just count the observations where wave
and this modified id
coincide. For a given pair, as soon as one wave is skipped in the data, the two indices will differ.
d %>%
arrange(wave) %>%
group_by(ego, alter) %>%
mutate(id = row_number() + min(wave) - 1) %>%
summarise(friendship_lenght = sum(wave==id))
# A tibble: 8 x 3
# Groups: ego [3]
ego alter friendship_lenght
<int> <int> <int>
1 1 2 1
2 1 3 1
3 1 4 1
4 1 5 2
5 2 8 3
6 3 4 1
7 3 6 3
8 3 7 1
EDIT
Addressing the new comment. We want to count the longest duration of consecutive friendship ties. row_number()
can be used to create a unique friendship-phase-id, by pair. Friendship in the first consecutive waves will all be given the same integer, and so forth for all subsequent consecutive friendships. Thus we can count how many times each single integer shows up, and take the max:
dd %>%
arrange(wave) %>%
group_by(ego, alter) %>%
count(wave - row_number() ) %>%
summarise(friendship_lenght = max(n))
# A tibble: 9 x 3
# Groups: ego [3]
ego alter friendship_lenght
<int> <int> <dbl>
1 1 2 1
2 1 3 1
3 1 4 1
4 1 5 2
5 2 8 3
6 3 4 1
7 3 6 3
8 3 7 1
9 3 8 3
Data
library(dplyr)
d <- read.table(text = "
ego alter wave
1 5 1
1 4 1
1 5 2
1 2 2
1 3 2
2 8 1
2 8 2
2 8 3
3 4 1
3 7 1
3 6 1
3 6 2
3 7 3
3 6 3", header=T)
dd <- read.table(text = "
ego alter wave
1 5 1
1 4 1
1 5 2
1 2 2
1 3 2
2 8 1
2 8 2
2 8 3
3 4 1
3 7 1
3 6 1
3 6 2
3 7 3
3 6 3
3 8 2
3 8 3
3 8 8
3 8 6
3 8 7", header=T)
Upvotes: 2
Reputation: 3134
One more possibility.
First, let's make a function that counts the length of a consecutive sequence:
get_seq_len <- function(s){
if(length(s) == 0) return(0)
if(length(s) == 1) return(1)
consec_lengths <- rle(c(1, s[-1] - s[-length(s)]))$lengths
return(consec_lengths[1])
}
We can verify it works:
get_seq_len(numeric(0))
#> 0
get_seq_len(1)
#> 1
get_seq_len(1:4)
#> 4
get_seq_len(c(1:4, 4:5))
#> 4 (because not consecutive)
get_seq_len(c(1,3))
#> 1 (not consecutive)
Then we can simply use nesting to do that for each pair:
edges_long %>%
group_by(ego, alter) %>%
nest() %>%
mutate(vec_waves = map(data, ~ as.numeric(unlist(.x)))) %>% # convert dataframe to vector
mutate(len = map_dbl(vec_waves, get_seq_len))
# A tibble: 8 x 5
# Groups: ego, alter [8]
# ego alter data vec_waves len
# <dbl> <dbl> <list> <list> <dbl>
# 1 1 5 <tibble [2 x 1]> <dbl [2]> 2
# 2 1 4 <tibble [1 x 1]> <dbl [1]> 1
# 3 1 2 <tibble [1 x 1]> <dbl [1]> 1
# 4 1 3 <tibble [1 x 1]> <dbl [1]> 1
# 5 2 8 <tibble [3 x 1]> <dbl [3]> 3
# 6 3 4 <tibble [1 x 1]> <dbl [1]> 1
# 7 3 7 <tibble [2 x 1]> <dbl [2]> 1
# 8 3 6 <tibble [3 x 1]> <dbl [3]> 3
Upvotes: 1
Reputation: 543
This is probably a terrible way to do it but this worked!
edges_wide <- edges_long %>%
select(ego, alter, wave) %>%
group_by(ego, alter) %>%
mutate(col=seq_along(ego))%>% # add a column indicator
spread(key=col, value=wave) %>%
rename(col1 = "1", col2 = "2", col3 = "3",
col4 = "4", col5 = "5", col6 = "6",
col7 = "7")
edges_wide <- edges_wide %>%
mutate(wave1 = case_when(col1 == 1 ~ 1,
TRUE ~ as.numeric(0))) %>%
mutate(wave2 = case_when(col1 == 2 | col2 == 2 ~ 1,
TRUE ~ as.numeric(0))) %>%
mutate(wave3 = case_when(col1 == 3 | col2 == 3 | col3 == 3 ~ 1,
TRUE ~ as.numeric(0))) %>%
mutate(wave4 = case_when(col1 == 4 | col2 == 4 | col3 == 4 | col4 == 4 ~ 1,
TRUE ~ as.numeric(0))) %>%
mutate(wave5 = case_when(col1 == 5 | col2 == 5 | col3 == 5 | col4 == 5 | col5 == 5 ~ 1,
TRUE ~ as.numeric(0))) %>%
mutate(wave6 = case_when(col1 == 6 | col2 == 6 | col3 == 6 | col4 == 6 | col5 == 6 | col6 == 6 ~ 1,
TRUE ~ as.numeric(0))) %>%
mutate(wave7 = case_when(col1 == 7 | col2 == 7 | col3 == 7 | col4 == 7 | col5 == 7 | col6 == 7 | col7 == 7 ~ 1,
TRUE ~ as.numeric(0))) %>%
select(ego, alter, wave1, wave2, wave3, wave4, wave5, wave6, wave7)
most_consecutive_val = function(x, val = 1) {
with(rle(x), if(all(values != val)) 0 else max(lengths[values == val]))
}
edges_wide$span <- apply(edges_wide[-c(1:2)], MARGIN = 1, most_consecutive_val)
Upvotes: 0