Venkatesh Malhotra
Venkatesh Malhotra

Reputation: 251

How to check dependency of one column to another in a pandas dataframe

I have the following dataframe:

 import pandas as pd

 df=pd.DataFrame([[1,11,'a'],[1,12,'a'],[1,11,'a'],[1,12,'a'],[1,7,'a'],
                [1,12,'a']])
 df.columns=['id','code','name']

 df

    id  code name
0   1    11    a
1   1    12    a
2   1    11    a
3   1    12    a
4   1     7    a
5   1    12    a

As shown in the above dataframe, the value of column "id" is directly related to the value of column "name". If I have say, a million records, how can I know that a column is totally dependent on other column in a dataframe?

Upvotes: 4

Views: 2520

Answers (1)

piRSquared
piRSquared

Reputation: 294506

If they are totally dependent, then their factorizations will be the same

(df.id.factorize()[0] == df.name.factorize()[0]).all()

True

Upvotes: 6

Related Questions