user14250906
user14250906

Reputation: 197

Creating a variable that marks the first observed value in one column found in the other column

In the data, key variables are: 'listener' and 'speaker'. The first listener observed for each 'thread' is the original writer of the thread.

I am trying to create a separate variable, 'writer involvement', which marks in binary (0, 1) the rows where the writer of the thread was the speaker.

Test data:

structure(list(topic = c(1, 1, 1, 1, 2, 2, 2, 2, 2, 2), thread = c(10, 
10, 10, 10, 3, 3, 3, 3, 3, 3), listener = c(111, 111, 222, 111, 
222, 444, 333, 222, 444, 222), speaker = c(222, 333, 111, 444, 
444, 333, 222, 333, 222, 444)), class = "data.frame", row.names = c(NA, 
-10L), codepage = 65001L)

The end result would look like:

╔═══════╦════════╦══════════╦═════════╦════════════════════╦═══════════════════════════════════════════════════════════════╗
║ topic ║ thread ║ listener ║ speaker ║ writer_involvement ║ explanation                                                   ║
╠═══════╬════════╬══════════╬═════════╬════════════════════╬═══════════════════════════════════════════════════════════════╣
║   1   ║   10   ║    111   ║   222   ║          0         ║ The first observed listener (111) is the writer of the thread ║
╠═══════╬════════╬══════════╬═════════╬════════════════════╬═══════════════════════════════════════════════════════════════╣
║   1   ║   10   ║    111   ║   333   ║          0         ║                                                               ║
╠═══════╬════════╬══════════╬═════════╬════════════════════╬═══════════════════════════════════════════════════════════════╣
║   1   ║   10   ║    222   ║   111   ║          1         ║ The writer of this thread, 111, spoke here                    ║
╠═══════╬════════╬══════════╬═════════╬════════════════════╬═══════════════════════════════════════════════════════════════╣
║   1   ║   10   ║    111   ║   444   ║          0         ║                                                               ║
╠═══════╬════════╬══════════╬═════════╬════════════════════╬═══════════════════════════════════════════════════════════════╣
║   2   ║    3   ║    222   ║   444   ║          0         ║ The first observed listener (222) is the writer               ║
╠═══════╬════════╬══════════╬═════════╬════════════════════╬═══════════════════════════════════════════════════════════════╣
║   2   ║    3   ║    444   ║   333   ║          0         ║                                                               ║
╠═══════╬════════╬══════════╬═════════╬════════════════════╬═══════════════════════════════════════════════════════════════╣
║   2   ║    3   ║    333   ║   222   ║          1         ║ The writer of this thread, 222, spoke here                    ║
╠═══════╬════════╬══════════╬═════════╬════════════════════╬═══════════════════════════════════════════════════════════════╣
║   2   ║    3   ║    222   ║   333   ║          0         ║                                                               ║
╠═══════╬════════╬══════════╬═════════╬════════════════════╬═══════════════════════════════════════════════════════════════╣
║   2   ║    3   ║    444   ║   222   ║          1         ║ The writer of this thread, 222, spoke here                    ║
╠═══════╬════════╬══════════╬═════════╬════════════════════╬═══════════════════════════════════════════════════════════════╣
║   2   ║    3   ║    222   ║   444   ║          0         ║                                                               ║
╚═══════╩════════╩══════════╩═════════╩════════════════════╩═══════════════════════════════════════════════════════════════╝

Upvotes: 1

Views: 33

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 101099

A base R option using ave

within(
  df,
  writer_involvement <- +(ave(listener, topic, thread, FUN = function(x) head(x, 1)) == speaker)
)

gives

   topic thread listener speaker writer_involvement
1      1     10      111     222                  0
2      1     10      111     333                  0
3      1     10      222     111                  1
4      1     10      111     444                  0
5      2      3      222     444                  0
6      2      3      444     333                  0
7      2      3      333     222                  1
8      2      3      222     333                  0
9      2      3      444     222                  1
10     2      3      222     444                  0

A data.table option

setDT(df)[, writer_involvement := +(speaker == head(listener, 1)), .(topic, thread)]

gives

> df
    topic thread listener speaker writer_involvement
 1:     1     10      111     222                  0
 2:     1     10      111     333                  0
 3:     1     10      222     111                  1
 4:     1     10      111     444                  0
 5:     2      3      222     444                  0
 6:     2      3      444     333                  0
 7:     2      3      333     222                  1
 8:     2      3      222     333                  0
 9:     2      3      444     222                  1
10:     2      3      222     444                  0

Upvotes: 0

akrun
akrun

Reputation: 886948

We can use match with nomatch= 0 after grouping by 'topic'

library(dplyr)
df %>%
   group_by(topic) %>% 
   mutate(write_involvement =  match(speaker, first(listener), nomatch = 0)) %>%
   ungroup

-output

# A tibble: 10 x 5
#   topic thread listener speaker write_involvement
#   <dbl>  <dbl>    <dbl>   <dbl>             <int>
# 1     1     10      111     222                 0
# 2     1     10      111     333                 0
# 3     1     10      222     111                 1
# 4     1     10      111     444                 0
# 5     2      3      222     444                 0
# 6     2      3      444     333                 0
# 7     2      3      333     222                 1
# 8     2      3      222     333                 0
# 9     2      3      444     222                 1
#10     2      3      222     444                 0

Or create a logical output and coerce to binary

df %>%
   group_by(topic) %>% 
   mutate(write_involvement =  +(speaker == first(listener)))

Upvotes: 1

Related Questions