slaw
slaw

Reputation: 6869

Write Pandas DataFrame with List in Column to a File

I have a simple dataframe that has emails being sent to different receivers:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Sender': ['Alice', 'Alice', 'Bob', 'Carl', 'Bob', 'Alice'],
                   'Receiver': ['David', 'Eric', 'Frank', 'Ginger', 'Holly', 'Ingrid'],
                   'Emails': [9, 3, 5, 1, 6, 7]
                  })
df

That looks like this:

    Emails  Receiver    Sender
0   9       David       Alice
1   3       Eric        Alice
2   5       Frank       Bob
3   1       Ginger      Carl
4   6       Holly       Bob
5   7       Ingrid      Alice

For each sender, I can get a list of receivers by performing a groupby along with a custom aggregation:

grouped = df.groupby('Sender')
grouped.agg({'Receiver': (lambda x: list(x)),
                   'Emails': np.sum
                  })

Which produces this dataframe output:

        Emails  Receiver
Sender      
Alice   19      [David, Eric, Ingrid]
Bob     11      [Frank, Holly]
Carl    1       [Ginger]

I want to write the dataframe to a file (not a CSV since it will be jagged) with spaces separating each element (including splitting out the list) so it would look like this:

Alice 19 David Eric Ingrid
Bob 11 Frank Holly
Carl 1 Ginger

I'd could iterate over each row and write the contents to a file but I was wondering if there was a better approach to get the same output starting from the original dataframe?

Upvotes: 0

Views: 422

Answers (2)

Dani Mesejo
Dani Mesejo

Reputation: 61910

You are almost there, just use ' '.join as the aggregating function for the Receiver column:

import numpy as np
import pandas as pd

df = pd.DataFrame({'Sender': ['Alice', 'Alice', 'Bob', 'Carl', 'Bob', 'Alice'],
                   'Receiver': ['David', 'Eric', 'Frank', 'Ginger', 'Holly', 'Ingrid'],
                   'Emails': [9, 3, 5, 1, 6, 7]
                   })

grouped = df.groupby('Sender')
result = grouped.agg({'Receiver': ' '.join,
             'Emails': np.sum
             })

print(result)

Output

                 Receiver  Emails
Sender                           
Alice   David Eric Ingrid      19
Bob           Frank Holly      11
Carl               Ginger       1

For the sake of completeness, if the Receiver column where int instead of strings you could transform to string first and then join:

df = pd.DataFrame({'Sender': ['Alice', 'Alice', 'Bob', 'Carl', 'Bob', 'Alice'],
                   'Receiver': [1, 2, 3, 4, 5, 6],
                   'Emails': [9, 3, 5, 1, 6, 7]
                   })

grouped = df.groupby('Sender')
result = grouped.agg({'Receiver': lambda x: ' '.join(map(str, x)),
                      'Emails': np.sum
                      })

print(result)

Output

       Receiver  Emails
Sender                 
Alice     1 2 6      19
Bob         3 5      11
Carl          4       1

Upvotes: 0

Anwarvic
Anwarvic

Reputation: 12992

You can do that using like so:

output_file = './out.txt'
with open(output_file, 'w') as fout:
    for group, df in grouped:
        fout.write('{} {} {}\n'.format(group,
                                       sum(df['Emails'].values),
                                       ' '.join(df['Receiver'].values)))

Now, the out.txt file will be:

Alice 19 David Eric Ingrid
Bob 11 Frank Holly
Carl 1 Ginger

Upvotes: 1

Related Questions