Reputation: 1490
For instance, I have such a function, that extract the name of the items and mark the length of that group:
def func(name, len):
with open("file.txt", "a") as f:
f.write(name+len+"\n")
And how can I get the name of each group to apply this function like:
df.groupby("id_").apply(lambda group: func(group.name, len(group)))
Thank you in advance!
EDIT:
def split_group_to_df(group, fullpath):
group.apply(lambda df: write_df_to_file(df, fullpath))
def write_stock_to_file(df, fullpath):
with open(fullpath, 'a') as fwrite:
if os.stat(fullpath).st_size == 0:
df.to_csv(fwrite, index=False)
else:
df.to_csv(fwrite, index=False, header=False)
df = pd.read_csv("file.txt")
df.groupby('id_').apply(lambda group: split_group_to_df(group, group.name+'.txt'))
And the output is:
000008
92000000
12121
Each row in the original data frame is broken in different rows now. Why?
Upvotes: 1
Views: 180
Reputation: 863256
I think there is problem with GroupBy.apply
, if you want use with function for writing to file, because first group is call twice:
Docs:
Warning
In the current implementation apply calls func twice on the first group to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first group.
In [123]: d = pd.DataFrame({"a":["x", "y"], "b":[1,2]})
In [124]: def identity(df):
.....: print df
.....: return df
.....:
In [125]: d.groupby("a").apply(identity)
a b
0 x 1
a b
0 x 1
a b
1 y 2
Out[125]:
a b
0 x 1
1 y 2
So if use:
import pandas as pd
df = pd.DataFrame({'id_':[1,2,3,4,1,2,3,1],
'name':[4,5,6,1,4,2,4,7]})
print (df)
def func(name, len):
with open("file.txt", "a") as f:
f.write(str(name)+str(len)+"\n")
df.groupby("id_").apply(lambda group: func(group.name, len(group)))
Output file is:
0 4
4 4
7 7
Name: name, dtype: int643
13
22
32
41
I think you can use size
with to_csv
:
print (df.groupby("id_").size().reset_index(name='count').to_csv(header=False, index=False, sep=' '))
Upvotes: 1