DD DD
DD DD

Reputation: 1238

How to match data between two csv file in Python

I have two csv file like below.

In 'order.csv'

date   orderName    orderNumber
1/7      Tom            1
1/7      Jeny           4
1/7      Brown          2
1/7      Tom            3 
1/8      Sky            5
1/8      Blue           7
1/8      Red            6
1/8      Wine           8
1/9      Tom            9
1/9      Earth          11
1/9      Earth          10


In 'run.csv'

date   runName    runNumber
1/7      Tom           
1/7      Jeny           
1/7      Tom            
1/7      Brown           
1/8      Sky            
1/8      Blue           
1/8      Red            
1/8      Wine           
1/9      Tom          
1/9      Earth          
1/9      Earth          

My goal is that the orderNumber has to go inside of runNumber matching the date and orderName. But the thing is that the order is different between order and run csv. And if the orderName is duplicated on the same date. It doesn't have to considerate.

In 'result.csv'

date    runName    runNumber
1/7      Tom               
1/7      Jeny          4  
1/7      Tom            
1/7      Brown         2 
1/8      Sky           5 
1/8      Blue          7 
1/8      Red           6 
1/8      Wine          8 
1/9      Tom           9          
1/9      Earth          
1/9      Earth   

How can I make using methods of python for this case?
I tried to do something but for me it's too hard problem.
So I asked here entirely. I am sorry.

Upvotes: 0

Views: 63

Answers (1)

H. W. Son
H. W. Son

Reputation: 26

From what I have gone through, it seems like you don't have to have the two files since those exactly have the same structure. The first thing you can do is loading the csv file with Pandas like this below.

import pandas as pd

df = pd.read_csv('order.csv')

Then, you can add a new column based on the other columns. And to get rid of duplicates, using Counter which is default Python library might be a good option.

from collections import Counter

name_cnt = Counter(df['orderName'])

def set_run_number(row):
    if name_cnt[row['orderName']] > 1:
        return
    else:
        return int(row['orderNumber'])


df['runNumber'] = df.apply(lambda row: set_run_number(row), axis=1)

And it goes like this.

   date orderName  orderNumber  runNumber
0   1/7       Tom            1        NaN
1   1/7      Jeny            4        4.0
2   1/7     Brown            2        2.0
3   1/7       Tom            3        NaN
4   1/8       Sky            5        5.0
5   1/8      Blue            7        7.0
6   1/8       Red            6        6.0
7   1/8      Wine            8        8.0
8   1/9       Tom            9        NaN
9   1/9     Earth           11        NaN
10  1/9     Earth           10        NaN

Upvotes: 1

Related Questions