Reputation: 3915
I just started working with the pandas library to analyze large datasets. I am analyzing creditcard data that has the property issuercountrycode
, that consists out of 117 possibilities. When trying to visualize what issuercountrycode
are used in my dataset, I currently use the following code to generate a piechart.
df['issuercountrycode'].value_counts().plot(kind='pie')
plt.show()
This results in the following piechart:
As you can see, this isn't ideal because multiple values are not used that often. Is there a possibility in pandas to, when using the value_counts() function, add a threshold, and add values that are lower than a certain value to a 'rest' group? Are these type of operations even possible in pandas?
Upvotes: 2
Views: 908
Reputation: 862511
You need count it with boolean indexing
and sum
:
tresh = 2
a = df['issuercountrycode'].value_counts()
b = a[a > tresh]
b['rest'] = a[a <= tresh].sum()
Sample:
np.random.seed(10)
L = list('abcdef')
df = pd.DataFrame({'issuercountrycode':np.random.choice(L, size=15)})
tresh = 2
a = df['issuercountrycode'].value_counts()
b = a[a > tresh]
b['rest'] = a[a <= tresh].sum()
print (b)
b 5
f 3
a 3
rest 4
Name: issuercountrycode, dtype: int64
b.plot.pie()
Upvotes: 2