Reputation: 6869
I have a simple dataframe that has emails being sent to different receivers:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Sender': ['Alice', 'Alice', 'Bob', 'Carl', 'Bob', 'Alice'],
'Receiver': ['David', 'Eric', 'Frank', 'Ginger', 'Holly', 'Ingrid'],
'Emails': [9, 3, 5, 1, 6, 7]
})
df
That looks like this:
Emails Receiver Sender
0 9 David Alice
1 3 Eric Alice
2 5 Frank Bob
3 1 Ginger Carl
4 6 Holly Bob
5 7 Ingrid Alice
For each sender, I can get a list of receivers by performing a groupby along with a custom aggregation:
grouped = df.groupby('Sender')
grouped.agg({'Receiver': (lambda x: list(x)),
'Emails': np.sum
})
Which produces this dataframe output:
Emails Receiver
Sender
Alice 19 [David, Eric, Ingrid]
Bob 11 [Frank, Holly]
Carl 1 [Ginger]
I want to write the dataframe to a file (not a CSV since it will be jagged) with spaces separating each element (including splitting out the list) so it would look like this:
Alice 19 David Eric Ingrid
Bob 11 Frank Holly
Carl 1 Ginger
I'd could iterate over each row and write the contents to a file but I was wondering if there was a better approach to get the same output starting from the original dataframe?
Upvotes: 0
Views: 422
Reputation: 61910
You are almost there, just use ' '.join
as the aggregating function for the Receiver
column:
import numpy as np
import pandas as pd
df = pd.DataFrame({'Sender': ['Alice', 'Alice', 'Bob', 'Carl', 'Bob', 'Alice'],
'Receiver': ['David', 'Eric', 'Frank', 'Ginger', 'Holly', 'Ingrid'],
'Emails': [9, 3, 5, 1, 6, 7]
})
grouped = df.groupby('Sender')
result = grouped.agg({'Receiver': ' '.join,
'Emails': np.sum
})
print(result)
Output
Receiver Emails
Sender
Alice David Eric Ingrid 19
Bob Frank Holly 11
Carl Ginger 1
For the sake of completeness, if the Receiver
column where int
instead of strings you could transform to string first and then join:
df = pd.DataFrame({'Sender': ['Alice', 'Alice', 'Bob', 'Carl', 'Bob', 'Alice'],
'Receiver': [1, 2, 3, 4, 5, 6],
'Emails': [9, 3, 5, 1, 6, 7]
})
grouped = df.groupby('Sender')
result = grouped.agg({'Receiver': lambda x: ' '.join(map(str, x)),
'Emails': np.sum
})
print(result)
Output
Receiver Emails
Sender
Alice 1 2 6 19
Bob 3 5 11
Carl 4 1
Upvotes: 0
Reputation: 12992
You can do that using like so:
output_file = './out.txt'
with open(output_file, 'w') as fout:
for group, df in grouped:
fout.write('{} {} {}\n'.format(group,
sum(df['Emails'].values),
' '.join(df['Receiver'].values)))
Now, the out.txt
file will be:
Alice 19 David Eric Ingrid
Bob 11 Frank Holly
Carl 1 Ginger
Upvotes: 1