Ben Smith
Ben Smith

Reputation: 380

loop through Pandas DF and append values to a list which is a value of a dictionary where conditional value is the key

Very hard to make a short but descriptive title for this but I have a dataframe where each row is for a character's line, with the entire corpus being the entire show. I to create a dictionary where the keys are a list of the top characters, loop through the DF and append each dialogue line to their keys value, which I want as a list

I have a column called 'Character' and a column called 'dialogue':

Character      dialogue
PICARD         'You will agree Data that Starfleets
               order are...'
DATA           'Difficult? Simply solve the mystery of 
               Farpoint Station.'
PICARD         'As simple as that.'
TROI           'Farpoint Station. Even the name sounds
                mysterious.'

And so on and so on... There are many minor characters so I just want the top 10 characters by dialogue count so I have a list of them called major_chars. I want a final dictionary where each character is the key and the value is a huge list of all their lines. I don't know how to append to an empty list set up as the value for each key. My code thus far is:

char_corpuses = {} 
for label, row in df.iterrows():
    for char in main_chars:
        if row['Character'] == char:
            char_corpuses[char] = [row['dialogue']]

But the end result is only the last line each Character says in the corpus:

{'PICARD': [' so five card stud nothing wild and the skys the limit'],
 'DATA': [' would you care to deal sir'],
 'TROI': [' you were always welcome'],
 'WORF': [' agreed'],
 'Q': [' youll find out in any case ill be watching and if youre very lucky ill drop by to say hello from time to time see you out there'],
 'RIKER': [' of course have a seat'],
 'WESLEY': [' i will bye mom'],
 'CRUSHER': [' you know i was thinking about what the captain told us about the future about how we all changed and drifted apart why would he want to tell us whats to come'],
 'LAFORGE': [' sure goes against everything weve heard about not polluting the time line doesnt it'],
 'GUINAN': [' thank you doctor this looks like a great racquet but er i dont play tennis never have']}

How do I get it to not clear out each line before and only take the last line for each character

Upvotes: 0

Views: 99

Answers (3)

theDBA
theDBA

Reputation: 239

TopHowmany = 10         #  This you can change as you want.

subDF = df[df.Charactar.isin(df.Charactar.value_counts()[0:TopHowmany].index)]

char_corpuses = {}
for x in subDF.index:
    char = subDF.loc[x,'Charactar']
    dialogue = subDF.loc[x,'Dialogue']
    if subDF.loc[x,'Charactar'] in char_corpuses:
        char_corpuses[char].append('dialogue')
    else:
        char_corpuses[char] = [dialogue]

Upvotes: 0

L.Clarkson
L.Clarkson

Reputation: 492

This line char_corpuses[char] = [row['dialogue']] overwrites the contents of the list with current dialogue line each time the loop runs. It writes a single element rather than appending.

For a 'vanilla' dictionary try:

import pandas
d = {'Character': ['PICARD', 'DATA', 'PICARD'], 'dialogue': ['You will agree Data that Starfleets order are...', 'Difficult? Simply solve the mystery of Farpoint Station.', 'As simple as that.']}
df = pandas.DataFrame(data=d)
main_chars = ['PICARD', 'DATA']
char_corpuses = {}


for label, row in df.iterrows():
    for char in main_chars:
        if row['Character'] == char:
            try:
                # Try to append the current dialogue line to array
                char_corpuses[char].append(row['dialogue'])
            except KeyError:
                # The key doesn't exist yet, create empty list for the key [char]
                char_corpuses[char] = []
                char_corpuses[char].append(row['dialogue'])

Output

{'PICARD': ['You will agree Data that Starfleets order are...', 'As simple as that.'], 'DATA': ['Difficult? Simply solve the mystery of Farpoint Station.']}

Upvotes: 1

Sabri B
Sabri B

Reputation: 51

Try something like this ^^

char_corpuses = {}
for char in main_chars:
  char_corpuses[char] = df[df.name == char]['dialogue'].values

Upvotes: 1

Related Questions