Reputation: 380
Very hard to make a short but descriptive title for this but I have a dataframe where each row is for a character's line, with the entire corpus being the entire show. I to create a dictionary where the keys are a list of the top characters, loop through the DF and append each dialogue line to their keys value, which I want as a list
I have a column called 'Character' and a column called 'dialogue':
Character dialogue
PICARD 'You will agree Data that Starfleets
order are...'
DATA 'Difficult? Simply solve the mystery of
Farpoint Station.'
PICARD 'As simple as that.'
TROI 'Farpoint Station. Even the name sounds
mysterious.'
And so on and so on... There are many minor characters so I just want the top 10 characters by dialogue count so I have a list of them called major_chars. I want a final dictionary where each character is the key and the value is a huge list of all their lines. I don't know how to append to an empty list set up as the value for each key. My code thus far is:
char_corpuses = {}
for label, row in df.iterrows():
for char in main_chars:
if row['Character'] == char:
char_corpuses[char] = [row['dialogue']]
But the end result is only the last line each Character says in the corpus:
{'PICARD': [' so five card stud nothing wild and the skys the limit'],
'DATA': [' would you care to deal sir'],
'TROI': [' you were always welcome'],
'WORF': [' agreed'],
'Q': [' youll find out in any case ill be watching and if youre very lucky ill drop by to say hello from time to time see you out there'],
'RIKER': [' of course have a seat'],
'WESLEY': [' i will bye mom'],
'CRUSHER': [' you know i was thinking about what the captain told us about the future about how we all changed and drifted apart why would he want to tell us whats to come'],
'LAFORGE': [' sure goes against everything weve heard about not polluting the time line doesnt it'],
'GUINAN': [' thank you doctor this looks like a great racquet but er i dont play tennis never have']}
How do I get it to not clear out each line before and only take the last line for each character
Upvotes: 0
Views: 99
Reputation: 239
TopHowmany = 10 # This you can change as you want.
subDF = df[df.Charactar.isin(df.Charactar.value_counts()[0:TopHowmany].index)]
char_corpuses = {}
for x in subDF.index:
char = subDF.loc[x,'Charactar']
dialogue = subDF.loc[x,'Dialogue']
if subDF.loc[x,'Charactar'] in char_corpuses:
char_corpuses[char].append('dialogue')
else:
char_corpuses[char] = [dialogue]
Upvotes: 0
Reputation: 492
This line char_corpuses[char] = [row['dialogue']]
overwrites the contents of the list with current dialogue line each time the loop runs. It writes a single element rather than appending.
For a 'vanilla' dictionary try:
import pandas
d = {'Character': ['PICARD', 'DATA', 'PICARD'], 'dialogue': ['You will agree Data that Starfleets order are...', 'Difficult? Simply solve the mystery of Farpoint Station.', 'As simple as that.']}
df = pandas.DataFrame(data=d)
main_chars = ['PICARD', 'DATA']
char_corpuses = {}
for label, row in df.iterrows():
for char in main_chars:
if row['Character'] == char:
try:
# Try to append the current dialogue line to array
char_corpuses[char].append(row['dialogue'])
except KeyError:
# The key doesn't exist yet, create empty list for the key [char]
char_corpuses[char] = []
char_corpuses[char].append(row['dialogue'])
Output
{'PICARD': ['You will agree Data that Starfleets order are...', 'As simple as that.'], 'DATA': ['Difficult? Simply solve the mystery of Farpoint Station.']}
Upvotes: 1
Reputation: 51
Try something like this ^^
char_corpuses = {}
for char in main_chars:
char_corpuses[char] = df[df.name == char]['dialogue'].values
Upvotes: 1