user15196250
user15196250

Reputation: 11

Applying pre trained facebook/bart-large-cnn for text summarization in python on a dataframe column

I am working with huggingface transformers(Summarizers) and have got some insights into it. I am working with the facebook/bart-large-cnn model to perform text summarisation and I am running the below code:

from transformers import pipeline
summarizer = pipeline("summarization") 
text= "Good Morning team, I need a help in terms of one of the functions that needs to be written on the servers.. please let me know wen are you available.. Thanks , hgjhghjgjh, 193-6757-568"
print(summarizer(str(text), min_length = int(0.1 * len(str(text))), max_length = int(0.2 * len(str(text))),do_sample=False))

But my question is that how can I apply the same pre trained model on top of my dataframe column. My dataframe looks like this:

ID       Text
1          some long text here...
2          some long text here...
3          some long text here...
.... and so on for 100K rows

Now I want to apply the pre trained model to the col Text to generate a new column df['summary_Text'] from it and the resultant dataframe should look like:

ID          Text                              Summary_Text
1          some long text here...           Text summary goes here...
2          some long text here...           Text summary goes here...
3          some long text here...           Text summary goes here...

HOw can i get this ? ANy quick help would be highly appreciated

Upvotes: 1

Views: 3432

Answers (2)

Ahmed Saber
Ahmed Saber

Reputation: 1

this is my code to iterate through excel rows from column X and get summarization in another column Y, hope this can help you

from transformers import pipeline
import openpyxl

wb = openpyxl.load_workbook(wb, read_only=False)    
ws = wb["sheet"]   
bart_summarizer = pipeline("summarization")    
for row in ws.iter_rows(min_col=8, min_row=2, max_col=8, max_row= 5):    
    for cell in row:    
        TEXT_TO_SUMMARIZE = cell.value    
        summary = bart_summarizer(TEXT_TO_SUMMARIZE, min_length=10, max_length=100)    
        r = cell.row   
        ws.cell(row=r, column=10).value = str(summary)   
        wb.save(wb)    

Upvotes: 0

Gaurav Hazra
Gaurav Hazra

Reputation: 432

I am working on the same line trying to summarize news articles. You can input either strings or lists to the model. First convert your dataframe 'Text' column to a list:

input_col = df['Text'].to_list()

Then feed it to your model:

from transformers import pipeline
summarizer = pipeline("summarization") 

res = summarizer(input_col, min_length = int(0.1 * len(str(text))), max_length = int(0.2 * len(str(text))),do_sample=False)
print(res[0]['summary_text])

This gives back a list and prints only first output of it. You can recurse over the list (res[1]['summary_text']..res[2]['summary_text'] and so on....) and store it and add it back as a dataframe column.

df_res = []
for i in range(len(res)):
   df_res.append(res[i]['summary_text'])

df['Summary_Text'] = df_res

Use truncation=True as input parameter (where you input min_length etc.) for the summarizer if your articles are long.

This will take a long time using cpu. I myself am looking for faster alternatives. For me XL_net is a usable option for now. Hope this helps!

Upvotes: 2

Related Questions