justanewb
justanewb

Reputation: 133

Find the mode for a pandas column based on filtering on another pandas column

I have a dataframe that looks similar to this

df = pd.DataFrame({'id': [1001, 1002, 1003, 1004, 1005, 1006]
                  'resolution_modified': ['It is recommended to replace scanner',
                                          'It is recommended to replace scanner',
                                          'It is recommended to replace laptop',
                                          'It is recommended to replace laptop',
                                          'It is recommended to replace printer',
                                          'It is recommended to replace printer'],
                   'cluster':[1,1,2,2,3,3]})

I want to find the string in resolution_modified that occurs the most for each unique cluster such that I will have a map where the key is the cluster and the value would be the mode string in the resolution_modified column.

This is what I have tried

# Get the string that occurs the most for each unqiue cluster
mode_string = {}
for cluster in hardware['cluster'].unique():
    if hardware[hardware['cluster']==cluster]:
        mode_string[cluster] = hardware['resolution_modified'].mode()[0]
mode_string

This did not work and throws an error:

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Upvotes: 0

Views: 480

Answers (2)

Dani Mesejo
Dani Mesejo

Reputation: 61910

The pandas way of doing that is to group by cluster and find the mode of resolution_modified:

res = df.groupby('cluster')['resolution_modified'].agg(pd.Series.mode)
mode_string = res.to_dict()
print(mode_string)

Output

{1: 'It is recommended to replace scanner', 2: 'It is recommended to replace laptop', 3: 'It is recommended to replace printer'}

See more on the documentation of agg and mode.

As an alternative, you could use statistics.mode:

from statistics import mode
res = df.groupby('cluster')['resolution_modified'].agg(mode)

Upvotes: 1

Pablo C
Pablo C

Reputation: 4761

You can use pandas.DataFrame.groupby with pandas.Series.mode:

mode_string = df.groupby("cluster")["resolution_modified"].agg(pd.Series.mode)

#cluster
#1       It is recommended to replace scanner
#2       It is recommended to replace laptop
#3       It is recommended to replace printer

You can also convert it to dict

mode_string = mode_string.to_dict()

#{1: 'It is recommended to replace scanner', 2: 'It is recommended to replace laptop', 3: 'It is recommended to replace printer'}

In both cases you can do:

mode_string[1]
#'It is recommended to replace scanner'

Upvotes: 2

Related Questions