Reputation: 554
Suppose I want to have a for loop
which in its each iteration will slice a big pandas dataframe
df
based on unique values of one of its columns lets say A
, and then pass this sliced dataframe to a function which takes a dataframe as an argument lets say fun(df)
. Basically fun(df)
will get a new sliced dataframe with every iteration of for loop.
For eg lets say following is my dataframe:
A B C D
1-1 an at 23
1-2 ab can 34
1-2 van bit 45
1-2 vd sun 23
1-1 so am 12
...
Now first iteration of for loop passes the below dataframe to fun(df)
A B C D
1-1 an at 23
1-1 so am 12
and next iteration will pass this one to fun(df)
A B C D
1-2 ab can 34
1-2 van bit 45
1-2 vd sun 23
and so on and so forth.
Number of iterations of for loop should be equal to the number of unique values of 'A'. Here in this case it will be 2.
How can I do this in python? I am new to it and don't know how to proceed further.
Upvotes: 0
Views: 2127
Reputation: 29742
Use pandas.DataFrame.groupby
, which returns an iterable object.
def fun(data):
# pseudo function for a test
print(data)
for k, d in df.groupby('A'):
fun(d)
Output:
A B C D
0 1-1 an at 23
4 1-1 so am 12
A B C D
1 1-2 ab can 34
2 1-2 van bit 45
3 1-2 vd sun 23
Insight
pandas.DataFrame.groupby
returns ((key, grouped-by dataframe), ...)
.
In the line for k, d in df.groupby('A')
, k
and d
are used to unpack the returned iterable. Since your desired output does not utilize the key (i.e. 1-1
, 1-2
, ...), so is the answer.
fun
in the answer represents any hypothetical function.
As mentioned in the question, fun
is the function that will get a new sliced dataframe with every iteration of for loop. In this case, it is a simple print
function to give a visual representation of what df.groupby
returns.
Upvotes: 2