Reputation: 133
I have a dataframe that looks similar to this
df = pd.DataFrame({'id': [1001, 1002, 1003, 1004, 1005, 1006]
'resolution_modified': ['It is recommended to replace scanner',
'It is recommended to replace scanner',
'It is recommended to replace laptop',
'It is recommended to replace laptop',
'It is recommended to replace printer',
'It is recommended to replace printer'],
'cluster':[1,1,2,2,3,3]})
I want to find the string in resolution_modified
that occurs the most for each unique cluster
such that I will have a map where the key is the cluster and the value would be the mode string in the resolution_modified
column.
This is what I have tried
# Get the string that occurs the most for each unqiue cluster
mode_string = {}
for cluster in hardware['cluster'].unique():
if hardware[hardware['cluster']==cluster]:
mode_string[cluster] = hardware['resolution_modified'].mode()[0]
mode_string
This did not work and throws an error:
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Upvotes: 0
Views: 480
Reputation: 61910
The pandas way of doing that is to group by cluster and find the mode of resolution_modified:
res = df.groupby('cluster')['resolution_modified'].agg(pd.Series.mode)
mode_string = res.to_dict()
print(mode_string)
Output
{1: 'It is recommended to replace scanner', 2: 'It is recommended to replace laptop', 3: 'It is recommended to replace printer'}
See more on the documentation of agg and mode.
As an alternative, you could use statistics.mode:
from statistics import mode
res = df.groupby('cluster')['resolution_modified'].agg(mode)
Upvotes: 1
Reputation: 4761
You can use pandas.DataFrame.groupby
with pandas.Series.mode
:
mode_string = df.groupby("cluster")["resolution_modified"].agg(pd.Series.mode)
#cluster
#1 It is recommended to replace scanner
#2 It is recommended to replace laptop
#3 It is recommended to replace printer
You can also convert it to dict
mode_string = mode_string.to_dict()
#{1: 'It is recommended to replace scanner', 2: 'It is recommended to replace laptop', 3: 'It is recommended to replace printer'}
In both cases you can do:
mode_string[1]
#'It is recommended to replace scanner'
Upvotes: 2