How to slice a pandas dataframe based on unique values of a column inside a for loop and pass every slice to a function?

Question

Suppose I want to have a for loop which in its each iteration will slice a big pandas dataframe df based on unique values of one of its columns lets say A, and then pass this sliced dataframe to a function which takes a dataframe as an argument lets say fun(df). Basically fun(df) will get a new sliced dataframe with every iteration of for loop. For eg lets say following is my dataframe:

A    B    C    D
1-1  an  at   23
1-2  ab  can  34
1-2  van bit  45
1-2  vd  sun  23
1-1  so  am   12
...

Now first iteration of for loop passes the below dataframe to fun(df)

A    B    C    D
1-1  an  at   23
1-1  so  am   12

and next iteration will pass this one to fun(df)

A    B    C    D
1-2  ab  can  34
1-2  van bit  45
1-2  vd  sun  23

and so on and so forth.

Number of iterations of for loop should be equal to the number of unique values of 'A'. Here in this case it will be 2.

How can I do this in python? I am new to it and don't know how to proceed further.

Chris · Accepted Answer

Use pandas.DataFrame.groupby, which returns an iterable object.

def fun(data):
    # pseudo function for a test
    print(data)

for k, d in df.groupby('A'):
    fun(d)

Output:

     A   B   C   D
0  1-1  an  at  23
4  1-1  so  am  12

     A    B    C   D
1  1-2   ab  can  34
2  1-2  van  bit  45
3  1-2   vd  sun  23

Insight

pandas.DataFrame.groupby returns ((key, grouped-by dataframe), ...).

In the line for k, d in df.groupby('A'), k and d are used to unpack the returned iterable. Since your desired output does not utilize the key (i.e. 1-1, 1-2, ...), so is the answer.

fun in the answer represents any hypothetical function. As mentioned in the question, fun is the function that will get a new sliced dataframe with every iteration of for loop. In this case, it is a simple print function to give a visual representation of what df.groupby returns.

How to slice a pandas dataframe based on unique values of a column inside a for loop and pass every slice to a function?

Answers (1)

Related Questions