Find the mode for a pandas column based on filtering on another pandas column

Question

I have a dataframe that looks similar to this

df = pd.DataFrame({'id': [1001, 1002, 1003, 1004, 1005, 1006]
                  'resolution_modified': ['It is recommended to replace scanner',
                                          'It is recommended to replace scanner',
                                          'It is recommended to replace laptop',
                                          'It is recommended to replace laptop',
                                          'It is recommended to replace printer',
                                          'It is recommended to replace printer'],
                   'cluster':[1,1,2,2,3,3]})

I want to find the string in resolution_modified that occurs the most for each unique cluster such that I will have a map where the key is the cluster and the value would be the mode string in the resolution_modified column.

This is what I have tried

# Get the string that occurs the most for each unqiue cluster
mode_string = {}
for cluster in hardware['cluster'].unique():
    if hardware[hardware['cluster']==cluster]:
        mode_string[cluster] = hardware['resolution_modified'].mode()[0]
mode_string

This did not work and throws an error:

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Pablo C · Accepted Answer

You can use pandas.DataFrame.groupby with pandas.Series.mode:

mode_string = df.groupby("cluster")["resolution_modified"].agg(pd.Series.mode)

#cluster
#1       It is recommended to replace scanner
#2       It is recommended to replace laptop
#3       It is recommended to replace printer

You can also convert it to dict

mode_string = mode_string.to_dict()

#{1: 'It is recommended to replace scanner', 2: 'It is recommended to replace laptop', 3: 'It is recommended to replace printer'}

In both cases you can do:

mode_string[1]
#'It is recommended to replace scanner'

Find the mode for a pandas column based on filtering on another pandas column

Answers (2)

Related Questions