Reputation: 341
Here i Read a file "userdata.xlsx":
ID Debt Email Age User
1 7.5 [email protected] 16 John
2 15 [email protected] 15 John
3 22 [email protected] 15 John
4 30 [email protected] 22 David
5 33 [email protected] 22 David
6 51 [email protected] 61 Fred
7 11 [email protected] 25 Fred
8 24 [email protected] 19 Eric
9 68 [email protected] 55 Terry
10 335 [email protected] 55 Terry
Here I group by User and create a spreadsheet per user and output it as its own .xlsx file like this:
ID Debt Email Age User
1 7.5 [email protected] 16 John
2 15 [email protected] 15 John
Here is the entire code:
#!/usr/bin/env python3
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import xlrd
df = pd.read_excel('userdata.xlsx')
grp = df.groupby('User')
for group in grp.groups:
grouptofile = (grp.get_group(group))
print(grouptofile)
print(group)
grouptofile.to_excel('%s.xlsx' % group , sheet_name='sheet1', index=False)
Now I only want to save selected columns to save for each user. Lets say I only want the columns "ID" and "Email" selected. I learned how to select only certain columns like this:
selected = df[['ID','Email']]
I now thought it would make sense to add the ID and Email here.
grp = df.groupby('User')
Added the "ID" and "Email"
grp = df[['ID', 'Email']].groupby('User')
Is it even possible to combine groupby and select columns?
#!/usr/bin/env python3
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import xlrd
df = pd.read_excel('userdata.xlsx')
grp = df[['ID', 'Email']].groupby('User')
for group in grp.groups:
grouptofile = (grp.get_group(group))
print(grouptofile)
print(group)
grouptofile.to_excel('%s.xlsx' % group , sheet_name='sheet1', index=False)
This is the Error im getting:
Traceback (most recent call last):
File "/Users/Barry/Documents/Python/Pandas/exelscript.py", line 22, in <module>
grp = df[['ID', 'Email']].groupby('User')
File "/usr/local/lib/python3.7/site-packages/pandas/core/generic.py", line 6665, in groupby
observed=observed, **kwargs)
File "/usr/local/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 2152, in groupby
return klass(obj, by, **kwds)
File "/usr/local/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 599, in __init__
mutated=self.mutated)
File "/usr/local/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 3291, in _get_grouper
raise KeyError(gpr)
KeyError: 'User'
Upvotes: 2
Views: 282
Reputation: 862691
I think you need specify columns in subset:
cols = ['ID', 'Email']
for i, group in df.groupby('User'):
group[cols].to_excel('{}.xlsx'.format(i), sheet_name='sheet1', index=False)
If get KeyError: 'User'
it means you want select column which not exist.
So if select columns ID
and Email
, then chained groupby cannot find User
column and raise error:
print (df[['ID', 'Email']])
ID Email
0 1 [email protected]
1 2 [email protected]
2 3 [email protected]
3 4 [email protected]
4 5 [email protected]
5 6 [email protected]
6 7 [email protected]
7 8 [email protected]
8 9 [email protected]
9 10 [email protected]
So is necessary select columns also use in groupby:
for i, group in df[['ID', 'Email', 'User']].groupby('User'):
group.to_excel('{}.xlsx'.format(i), sheet_name='sheet1', index=False)
Or select columns before writing to file like in first solution.
for i, group in df[['ID', 'Email', 'User']].groupby('User'):
group[cols].to_excel('{}.xlsx'.format(i), sheet_name='sheet1', index=False)
Upvotes: 1
Reputation: 6426
It's possible... but not quite how you're doing it.
You are effectively removing all of the columns but two, and then trying to group by a third column that doesn't exist any more. Instead, you'll need to group before you select the columns (though I don't know whether grouping in numpy
is a mutating operation, so you might need to make a copy first).
(Probably suboptimal) example:
grp = df[('ID', 'Email', 'User')].groupby('User')[('ID', 'Email')]
Upvotes: 0