Reputation: 59
I'm trying to create a new column in the below sample data frame that combines the data from the customer_response and the bot_response into a list shown in the combined column of the second table shown below.
One thing to note is that there will always be the same number of delimiters for the customer and bot column for each row.
df
conv_id | customer_response | bot_response |
---|---|---|
1 | Make a payment; Credit Card; $1000; Thanks! | I can help with that. Which account?; What amount?; Payment successful; You're Welcome! |
2 | Replace a card; Ending in 4352; overnight; thanks | I can help replace your card, which account?; How soon do you need it; The replacement card will arrive in 3-5 business days; No problem |
3 | What is my due date?; Am I past due?; thanks | Hello, your due date is the 3rd of each month; No, your account is current; you're welcome! |
Desired output:
conv_id | customer_response | bot_response | combined |
---|---|---|---|
1 | Make a payment; Credit Card; $1000; Thanks! | I can help with that. Which account?; What amount?; Payment successful; You're Welcome! | ["Customer: Make a payment", "Bot: I can help with that. Which account?", "Customer: Credit Card", "Bot: What amount?", "Customer: $1000", "Bot: Payment successful", "Customer: Thanks!", "Bot: You're Welcome!"] |
2 | ... | ... | ... |
3 | ... | ... | ... |
Here is my code so far but for some reason I'm drawing a blank on how to create a column to combine the two as shown in the combined column in the second table above.
df['cust_dil'] = [x for x in df['customer_response'].str.split(';')]
df['bot_dil'] = [x for x in df['bot_response'].str.split(';')]
At a high level, I know I need to loop through the elements in each row after splitting them and add the the respective text "Bot" or "Customer" before each element in the list and then combine the 2 columns in order.
I'm really stumped on this so any help that can be given is greatly appreciated.
Upvotes: 2
Views: 66
Reputation: 7799
Here is a more pandaic
approach to your problem with no loops
, comprehension list
and apply
logic.
import pandas as pd
import numpy as np
data_dict= {
'conv_id': [1, 2, 3],
'customer_response': ['Make a payment; Credit Card; $1000; Thanks!', 'Replace a card; Ending in 4352; overnight; thanks', 'What is my due date?; Am I past due?; thanks'],
'bot_response': ['I can help with that. Which account?; What amount?; Payment successful; You\'re Welcome!', 'I can help replace your card, which account?; How soon do you need it; The replacement card will arrive in 3-5 business days; No problem', 'Hello, your due date is the 3rd of each month; No, your account is current; you\'re welcome!']}
df = pd.DataFrame(data_dict)
bot_response_split = df["bot_response"].str.split(';', expand=True).fillna("").applymap('Bot : {}'.format).replace('Bot : $', '',regex=True)
customer_response_split = df["customer_response"].str.split(';', expand=True).fillna("").applymap('Customer : {}'.format).replace('Customer : $', '',regex=True)
bot_response_split.columns = np.arange(len(customer_response_split.columns), len(customer_response_split.columns) + len(bot_response_split.columns))
new_columns = list(customer_response_split.columns) + list(bot_response_split.columns)
new_columns[::2] = list(customer_response_split.columns)
new_columns[1::2] = list(bot_response_split.columns)
df['combined'] = pd.concat([customer_response_split, bot_response_split], axis=1).reindex(columns=new_columns).values.tolist()
Code explanation :
;
, those many columns will be created as the number of chats for each cell in the column, then we fill all our None
values with empty string
, then based on regex we prepend the appropriate third person in the chat eg Bot and customer
here. After that we agin apply regex to replace those cells by empty string
for which applymap added 'Customer : ' but the cell was empty.bot_response_split = df["bot_response"].str.split(';', expand=True).fillna("").applymap('Bot : {}'.format).replace('Bot : $', '',regex=True)
customer_response_split = df["customer_response"].str.split(';', expand=True).fillna("").applymap('Customer : {}'.format).replace('Customer : $', '',regex=True)
bot_response_datframe
as this will help in interleaving.bot_response_split.columns = np.arange(len(customer_response_split.columns), len(customer_response_split.columns) + len(bot_response_split.columns))
customer
and bot
dataframe to a third column list which will help us in interleaving new_columns
new_columns = list(customer_response_split.columns) + list(bot_response_split.columns)
customer
dataframe we take the column list and insert it to new_columns
alternatively. We are doing the same for bot
dataframe column as well.new_columns[::2] = list(customer_response_split.columns)
new_columns[1::2] = list(bot_response_split.columns)
customer
and bot
dataframe and re order our indexes such that the columns are interleaved, then we join all the columns so that the values are interleaved and insert it to a third column in df
called combined
df['combined'] = pd.concat([customer_response_split, bot_response_split], axis=1).reindex(columns=new_columns).values.tolist()
Output :
Upvotes: 1
Reputation: 83
When you want to create a column whose value depends on more than one other column, DataFrame.apply
is a good way to do it. You might try something like this
def combine_responses(row):
cust_dil = [x for x in row.customer_response.str.split(';')]
bot_dil = [x for x in row.bot_response.str.split(';')]
## Write some code to interleave these.
## https://stackoverflow.com/questions/7946798/interleave-multiple-lists-of-the-same-length-in-python
return [val for pair in zip(cust_dil, bot_dil) for val in pair]
df['combined'] = df.apply(combine_responses, axis=1)
Upvotes: 3