Austin Johnson
Austin Johnson

Reputation: 747

Write value from csv to xlsx file based on matching name using python

I have two files one an xlsx file and the other a csv file. I need to take the Fantasy points from the csv file and write them to the xlxs file IF the two names match above a 50% ratio.

import pandas as pd
from fuzzywuzzy import fuzz

df = pd.read_excel('/Users/aus10/Desktop/PGA/Tournament_Results/A_Military_Tribute_at_the_Greenbrier_2020.xlsx')
df_2 = pd.read_csv('/Users/aus10/Desktop/PGA/Fantasy_Tournament_Results/Greenbrier_2020.csv')

for name in df['Player_Name']:
    for second_name in df_2['Name']:
        Ratio = Ratio = fuzz.ratio(name.lower(),second_name.lower())
        if Ratio >= 50:
            df.at[-1, 'FPTS'] = df_2['FantasyPointsDraftKings']

I'm not exactly sure how to tell it to write to that specific file when the names match but here's what I have

I want to use the name from the bottom xlsx file and if it matches the name from the csv file then I want to take the FantasyPointsDraftKings value from that name and write it to the xlxs file matching the names and adding the value to FPTS Column in the xlsx file.

enter image description here xlsx file

Google Sheets link

https://docs.google.com/spreadsheets/d/1jfr4Ma86dfLS5VnoCXTbNOJtDXxsG7HK5lt3meteEXI/edit?usp=sharing

Upvotes: 1

Views: 106

Answers (1)

bigbounty
bigbounty

Reputation: 17368

  1. First you have to read the 2 files into pandas dataframe
  2. Take a subset of relevant columns of the 2 dataframes
  3. write a function that takes the above columns (point 2) and calculates the similarity

Here I have used Jaro-Wrinkler distance to find the similarity - https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance#:~:text=In%20computer%20science%20and%20statistics,Jaro).

You can use a different algorithm for matching

    import pandas as pd
    import jellyfish
    
    csv = pd.read_csv("data.csv")
    excel = pd.read_excel("data.xlsx")
    
    def process(row):
        match = jellyfish.jaro_distance(row["Name"], row["Player_Name"])
        if match > 0.5: return row["FantasyPoints"]
        return None
    
    excel["FantasyPoints"] = pd.concat([csv["Name"],csv["FantasyPoints"],excel["Player_Name"]],axis=1).apply(process, axis=1)
    
    excel.to_excel("data.xlsx")

Output:

       Player_Name  Scrambling  Average_Driving_Distance  Fairways_Hit    GIR  Sand_Saves  Average_Putts_GIR  Total_Putts_GIR  Putts_Per_Round  SG_Putting  SG_Around_the_Green  SG_Approach_the_Green  SG_Off_the_Tee   FPTS  FantasyPoints
0  Joaquin Niemann       78.57                     313.0         64.29  80.56       83.33              1.603               93            27.50        2.20                 0.56                   1.13            0.51  142.5          142.5
1         Tom Hoge       78.57                     315.1         67.86  61.11       71.43              1.591               70            25.25        0.69                 0.74                   0.89            0.57  113.0          113.0
2     Brian Harman       83.33                     303.1         64.29  66.67       40.00              1.646               79            26.00        1.27                 0.10                   0.72            0.56  107.5          107.5

Upvotes: 1

Related Questions