HimanAB
HimanAB

Reputation: 2573

Python Pandas plot with repeated elements

I have a csv file that contains repeated items. The reason that the items are repeated is that each row is indicating a specific loan that has been paid during time. For example, loan with ID=156 has been paid in 5 payments. So the loan information is repeated in each row except the payment amount and payment date is different. I want to make a bar chart of the paid loans but counting the paid loans by counting the rows that their status is paid is not working as it counts repeated loans. I am wondering how I should fix this problem.

Here is my code :

loans[loans["status"]=="paid"] ["location.country"].value_counts().plot(kind="bar", fontsize=7)

This code is trying to plot the countries that paid their loans but as I said it is not correct as loans["status"]=="paid" could be TRUE for a loan that is repeated in multiple rows, therefore, loans[loans["status"]=="paid"] ["location.country"].value_counts() is the wrong count.

Here is an example of how data looks:

ID  status paymentAmount   paymentProcessDate     Country
156 paid     $400            2/12/2009            CountryA
156 paid     $123            2/11/2010            CountryA
156 paid     $2000           2/09/2011            CountryA
134 unpaid    Nan            2/05/2013            CountryB
134 unpaid    Nan            2/07/2013            CountryB

Upvotes: 1

Views: 96

Answers (1)

hilberts_drinking_problem
hilberts_drinking_problem

Reputation: 11602

Here is one way to filter out IDs that have already been paid.

import pandas as pd
from io import StringIO

df = pd.DataFrame.from_csv(StringIO('''ID  status paymentAmount   paymentProcessDate     Country
156 paid     $400            2/12/2009            CountryA
156 paid     $123            2/11/2010            CountryA
156 paid     $2000           2/09/2011            CountryA
134 unpaid    Nan            2/05/2013            CountryB
134 unpaid    Nan            2/07/2013            CountryB'''), sep=r' +')

paid_ids = df[df.status=='paid'].ID.drop_duplicates()

unpaid = df[~df.ID.isin(paid_ids)]

unpaid.Country.value_counts().plot(kind='bar')

Upvotes: 2

Related Questions