Reputation: 33
I have searched for a specific answer to my question but without success.
First of all I have a data frame consisting of 48 variables, which looks something likes this:
> df
Text Screen_Name ...
1 a text where @Sam and @Su and @Jim are addressed Peter
2 a text where @Eric is addressed Margret
3 a text where @Sarah and @Adam are addressed John
Now I am extracting all strings that equal ("@\S+") and store them in a new column
df$addressees <- str_extract_all(df$text, "@\\S+")
This gets me:
... Screen_Name Addressees ...
1 Peter c("@Sam", "@Su", "@Jim")
2 Margret @Eric
3 John c("@Sarah", "@Adam")
Now I want to create a new data frame for the two columns where new rows for each "Addressee" are created by repeating the respective value of column "Screen_Name":
> df
Screen_Name Addressees
1 Peter Sam
2 Peter Su
3 Peter Jim
4 Margret Eric
5 John Sarah
6 John Adam
I have tried solutions to similar approaches, but none of them seems to work.
Thank you very much for your help!
Upvotes: 3
Views: 2116
Reputation: 42544
You may also try data.table
using the df
created by @raistlin:
library(data.table)
setDT(df)[, .(friends = unlist(friends)), by = "ego"]
ego friends
1: peter sam
2: peter su
3: peter jim
4: margaret eric
5: john sarah
6: john adam
Now, with the additional context supplied by the OP, the data.table
solution can be streamlined to solve the underlying problem in a one-liner.
To remove the leading @
in the Addressees
column as requested by the OP, the regular expression needs to be modified to use positive lookbehind.
library(data.table)
# read data (to make it a reproducible example)
dt <- fread("Text; Screen_Name
a text where @Sam and @Su and @Jim are addressed; Peter
a text where @Eric is addressed; Margret
a text where @Sarah and @Adam are addressed; John")
# use str_extract_all with modified regex
dt[, .(Addressees = unlist(stringr::str_extract_all(Text, "(?<=@)\\S+"))),
by = .(Screen_Name)]
# Screen_Name Addressees
#1: Peter Sam
#2: Peter Su
#3: Peter Jim
#4: Margret Eric
#5: John Sarah
#6: John Adam
Upvotes: 3
Reputation: 2496
Does this help?
Input:
Screen_Name <- c("Peter", "Margaret", "John")
Addressees <- c(c("@Sam", "@Su", "@Jim"), "@Eric", c("@Sarah", "@Adam") )
the tidyverse
way:
df <- data.frame(Screen_Name, Addressees) %>%
tidyr::expand(Screen_Name, Addressees)
Upvotes: 0
Reputation: 2939
OK, with a reproducible example:
# create df
ego <- c("peter","margaret","john")
friends <- list(c("sam","su","jim"),c("eric"),c("sarah","adam"))
df <- data.frame(ego,friends= I(friends),stringsAsFactors = F)
# use repeat function to repeat rows
times <- sapply(df$friends,length)
df <- df[rep(seq_len(nrow(df)), times),]
# assign back unlisted friends
df$friends <- unlist(friends)
Upvotes: 4