alejandro
alejandro

Reputation: 1

pandas loop through data frame for each unique value in column

I'm working with 2 csv files.

In the performance file: I have historical data on loan performance (i.e. loan 110 performance from month 1 to 7, then loan 111 performance from month 1 to 20).in the first file I have columns as follows: A= loan id, B= performance month (1 to 7), C=default amount. for each loanid there is 1 row per month of performance.

I'm trying to create a loop that gives me the first month that each loan has a default and copy the month and default amount into my second csv file which has descriptive data on each loanid. the idea is to add 2 columns on the second file and for each loanid, to retrieve the month when they first have a default value.

i'm working on jupyter notebook and so far I've imported pandas library and read the performance csv file.

any guidance would be appreciated.

import pandas as pd
data = pd.read_csv(r'c:\users\guest1\documents\python_example_performance.csv',delimiter=',')
data.head()

Upvotes: 0

Views: 225

Answers (1)

Jacky Tan
Jacky Tan

Reputation: 31

First of all, I can't comment as I don't have enough reputation. I would need more clarification on the issue. Could you show how the data look like? It's a bit confusing for me between the 100, 101 and the rating 1-7 or 1-20.

Based on my current understanding, I would remove non-default value first from first CSV. Since you're using Pandas, you can go through Loc. The syntax generally looks like this.

df = df[df[cols] > 0]

If they're duplicate, then keep the latest month or current month depends on your choice. Pandas support drop duplicate and have the option of keeping the first or last record. The syntax generally looks like this.

df = df.drop_duplicates(subset ="Col1", keep = 'last') 

For more documentation, please refer to : Pandas - Drop Duplicates

Lastly , you need to perform a Join for both Data Frames based on loan ID. The syntax generally looks like this.

df = pd.merge(df1, df2, how='left', on=['LoanID'])

For more documentation, please refer to : Pandas - Merge

Upvotes: 0

Related Questions