The Great
The Great

Reputation: 7743

How to automate data mapping from two excel sheets

I have two excel sheets. One Excel sheet contains the user responses (option selected by the user a/1,b/2,c/3,d/4,e/5) for the survey questions/variables and other file contains the the actual value for each of these options

Please find the example scenario

Excel file 1 contains OUTPUT data as shown below

enter image description here

Excel file 2 contains INPUT/DICTIONARY data as shown below

enter image description here

I expect my output where the options like a,b,c,d are replaced with actual values

enter image description here

Upvotes: 0

Views: 2837

Answers (1)

Javapocalypse
Javapocalypse

Reputation: 2373

Approach

  • Use openpyxl read the excel file containing the mappings.
  • Each mapping is in the form of [number].[space][label] Eg. 1. Male, So split the string using '. ', so you get the number and its corresponding label. Store it in a dictionary
  • Read the file containing the inputs, now these inputs will be used as the keys for reading labels from dictionaries
  • Replace cell original value from the value of dictionary
  • Save workbook. Done!

File containing user inputs (file1.xlsx)

enter image description here

File containing mappings (file2.xlsx)

enter image description here

Code

from openpyxl import Workbook, load_workbook

#########################
#   Creating Mappings   #
#########################


# Opening file containing mappings
filename = 'file2.xlsx'

wb = load_workbook(filename)
first_sheet = wb.get_sheet_names()[0]
worksheet = wb.get_sheet_by_name(first_sheet)

# Dictonaries to store mappings
gender = {}
ethnicity = {}


#here you iterate over the rows in the specific column
for row in range(2,worksheet.max_row + 1):

    # Making a dict for gender
    for column in "A":  #Gender Col
        cell_name = "{}{}".format(column, row)
        try:
            data = worksheet[cell_name].value.split('. ') #Split data into label and number
            if data[0] not in gender:
                gender[data[0]] = data[1]
        except:
            pass

    # Making a dict for Ethnicity
    for column in "B":  #Ethnicity Col
        cell_name = "{}{}".format(column, row)
        try:
            data = worksheet[cell_name].value.split('. ') #Split data into label and number
            if data[0] not in ethnicity:
                ethnicity[data[0]] = data[1]
        except:
            pass





#########################
#   Replacing Inputs    #
#########################

# Loading file containing User Input
filename = 'file1.xlsx'

wb = load_workbook(filename)
first_sheet = wb.get_sheet_names()[0]
worksheet = wb.get_sheet_by_name(first_sheet)



#here you iterate over the rows in the specific column
for row in range(2,worksheet.max_row + 1):

    # Mapping gender input to label
    for column in "A":  #Gendedr Col
        cell_name = "{}{}".format(column, row)
        try:
            worksheet[cell_name].value = gender[str(worksheet[cell_name].value)]
        except Exception as e:
            pass

    # Mapping ethnicity input to label
    for column in "B":  #Ethnicity Col
        cell_name = "{}{}".format(column, row)
        try:
            worksheet[cell_name].value = ethnicity[str(worksheet[cell_name].value)]
        except:
            pass



wb.save(filename)

Output

enter image description here

Upvotes: 3

Related Questions