user5779223
user5779223

Reputation: 1490

How to get the name of the groupby items when apply function with python-pandas?

For instance, I have such a function, that extract the name of the items and mark the length of that group:

def func(name, len):
    with open("file.txt", "a") as f:
         f.write(name+len+"\n")

And how can I get the name of each group to apply this function like:

df.groupby("id_").apply(lambda group: func(group.name, len(group))) 

Thank you in advance!

EDIT:

def split_group_to_df(group, fullpath):
     group.apply(lambda df: write_df_to_file(df, fullpath))

def write_stock_to_file(df, fullpath):
    with open(fullpath, 'a') as fwrite:
        if os.stat(fullpath).st_size == 0:
            df.to_csv(fwrite, index=False)
        else:
            df.to_csv(fwrite, index=False, header=False)
df = pd.read_csv("file.txt")
df.groupby('id_').apply(lambda group: split_group_to_df(group, group.name+'.txt'))

And the output is:

000008
92000000
12121

Each row in the original data frame is broken in different rows now. Why?

Upvotes: 1

Views: 180

Answers (1)

jezrael
jezrael

Reputation: 863256

I think there is problem with GroupBy.apply, if you want use with function for writing to file, because first group is call twice:

Docs:

Warning

In the current implementation apply calls func twice on the first group to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first group.

In [123]: d = pd.DataFrame({"a":["x", "y"], "b":[1,2]})

In [124]: def identity(df):
   .....:     print df
   .....:     return df
   .....: 

In [125]: d.groupby("a").apply(identity)
   a  b
0  x  1
   a  b
0  x  1
   a  b
1  y  2
Out[125]: 
   a  b
0  x  1
1  y  2

So if use:

import pandas as pd

df = pd.DataFrame({'id_':[1,2,3,4,1,2,3,1],
                   'name':[4,5,6,1,4,2,4,7]})

print (df)


def func(name, len):
    with open("file.txt", "a") as f:
        f.write(str(name)+str(len)+"\n")

df.groupby("id_").apply(lambda group: func(group.name, len(group))) 

Output file is:

0    4
4    4
7    7
Name: name, dtype: int643
13
22
32
41

I think you can use size with to_csv:

print (df.groupby("id_").size().reset_index(name='count').to_csv(header=False, index=False, sep=' '))

Upvotes: 1

Related Questions