Mary
Mary

Reputation: 151

Join multiple values into same cell R

I have a data frame with pos values for each document split down into single tokens. How can I merge the individual pos values into one single cell separated by a comma? So now I have something like

  doc_id sentence_id token_id    token  pos entity
1  text1           1        1   xxxxxx PRON       
2  text1           1        2     xxxx  AUX       
3  text1           1        3      xxx  AUX       
4  text1           1        4  xxxxxxx VERB       
5  text2           1        5     xxxx  DET       
6  text2           1        6      xxx NOUN  

How can I make it into

  doc_id                      pos    entity
1  text1  PRON, AUX, AUX, VERB...       
2  text2  AUX, NOUN, PRON, ADJ...       
3  text3  ...
4  text4  ...  
5  text5  ...
6  text6  ...

Do I need to create a new data frame or is there a Spacy function that can do this directly? Thank you

Upvotes: 1

Views: 947

Answers (2)

akrun
akrun

Reputation: 887038

We could use dplyr

library(dplyr)     
df1 %>%
     group_by(doc_id, entity) %>%
     summarise(pos = toString(pos), .groups = 'drop')

Upvotes: 1

Aqeel Padaria
Aqeel Padaria

Reputation: 231

You can collapse it like so:

aggregate(pos ~ doc_id, doc_df, paste, collapse = ", ")

You can store this in a separate dataframe and merge in any other columns you want to include from the original, or if you just need these two then you can use this directly.

Upvotes: 3

Related Questions