Reputation: 324
I have a simple data frame which I wish to apply groupby
function on column 'A' and generate new column calculated from defined function (loop within the function) that takes values from column 'B' and column 'C'. My problem is, I was able to able the function to whole data frame but not to grouped data frame (Exception: Column(s) B already selected
). I don't why it throws error on grouped data frame but not on whole data frame. My implementation is as below:
>>> import pandas as pd
>>>
>>> df = pd.read_csv("foo.txt", sep="\t")
>>> df
A B C
0 1 4 3
1 1 5 4
2 1 2 10
3 2 7 2
4 2 4 4
5 2 6 6
>>>
>>> def calc(data):
... length = len(data['B'])
... mx = data['B'][0]
... nx = data['C'][0]
... for i in range(1,length):
... my = data['B'][i]
... ny = data['C'][i]
... nx = nx + ny
... mx=(mx*nx+my*ny)/(nx+ny)
... return(mx)
...
>>> df_grouped = df.groupby(['A'])
>>> calc(df)
4.217694879423274
>>> calc(df_grouped)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in calc
File "/mnt/projects/kokep/kokep/devel/miniconda3/lib/python3.6/site-packages/pandas/core/base.py", line 250, in __getitem__
.format(selection=self._selection))
Exception: Column(s) B already selected
>>>
How can I get it worked? Thanks in advance.
Upvotes: 0
Views: 59
Reputation: 324
I figured out the problem. I think reset_index
function need to be applied for each of the groups:
>>> import pandas as pd
>>>
>>> df = pd.read_csv("foo.txt", sep="\t")
>>> df
A B C
0 1 4 3
1 1 5 4
2 1 2 10
3 2 7 2
4 2 4 4
5 2 6 6
>>>
>>> def calc(data):
... length = len(data['B'])
... mx = data['B'][0]
... nx = data['C'][0]
... for i in range(1,length):
... my = data['B'][i]
... ny = data['C'][i]
... nx = nx + ny
... mx=(mx*nx+my*ny)/(nx+ny)
... return(mx)
...
>>> result = []
>>> for name, group in df.groupby('A'):
... group = pd.DataFrame(group).reset_index()
... out = calc(group)
... result.append(out)
...
>>> result
[3.488215488215488, 5.866666666666666]
Upvotes: 1
Reputation: 351
I think your groupby is producing pandas.series and your function is not applied on this series. I tried playing with different groupby methods, for some reason it's not working. Once I find the solution, I will post it here.
Upvotes: 0