Reputation: 109
I have a question regarding my dataframe. Specifically, in one column, for each row, I have a list of speakers and speeches. Now, I want to choose exactly one speech, based on whether the speaker is the one I am looking for, which is noted within another column. So one column provides the last name I am looking for and the other column provides a list of all speakers (first and last name) and their speech and I want to create a new columns where this speech is stored in the respective row.
So my initial dataset looks like this:
ticker year quarter exel_lname jobposition speech
xx 2009 1 Angle CEO [("Mike Angle", "Thank you"), ("Barbara Barth", "It is")]
xx 2009 1 Barth CFO [("Mike Angle", "Thank you"), ("Barbara Barth", "It is")]
xx 2009 2 Angle CEO [("Mike Angle", "I am surprised"), ("Barbara Barth", "So am I")]
xx 2009 2 Barth CFO [("Mike Angle", "I am surprised"), ("Barbara Barth", "So am I")]
yy 2008 3 Cruz CEO [("Damien Cruz", "Hello"), ("Lara Dolm", "Nice to meet you")]
yy 2008 3 Dolm CFO [("Damien Cruz", "Hello"), ("Lara Dolm", "Nice to meet you")]
For row one for instance, I want to check each key-value pair whether the first list entry contains the last name, if no continue, if yes, take the speech part (i.e. second list entry) and store it in new column. As such, I want the following dataset (I hid the initial column speech here, but it should still be contained, so I do not want to replace it, just create a new column).
ticker year quarter exel_lname jobposition speechmanager
xx 2009 1 Angle CEO "Thank you"
xx 2009 1 Barth CFO "It is"
xx 2009 2 Angle CEO "I am surprised"
xx 2009 2 Barth CFO "So am I"
yy 2008 3 Cruz CEO "Hello"
yy 2008 3 Dolm CFO "Nice to meet you"
Could someone help me how to solve this in Python 3?
Thank you!! Julia
Upvotes: 1
Views: 71
Reputation: 5215
This is perhaps best accomplished by writting a function, and then applying it row-wise:
def get_speech(row):
matches = list(filter(lambda x: x[0].endswith(row['exel_lname']), row['speech']))
if len(matches) > 0:
return matches[0][1]
return ''
df['speechmanager'] = df.apply(get_speech, axis=1)
Upvotes: 2