Merging values in dataframe to write in excel

Question

I have a dataframe that looks like

       column1    column2     column3    colum4  column5 
1      r_n_1      r_s_1       r_n_2      r_s_3   r_n_3
2      r_n_1      r_s_1       r_n_4      r_s_4   r_n_5
3      r_n_1      r_s_1       r_n_6      r_s_5   r_n_7
4      r_n_1      r_s_1       r_n_6      r_s_6   r_n_9
5      r_n_10     r_s_7       r_n_11     r_s_8   r_n_12
6      r_n_10     r_s_9       r_n_11     r_s_10  r_n_13

And I would like to merge cells in data frame so I could write in excel that could look like

So basically merge cells that have same value in excel. I am guessing I can use MultiIndex from pandas but I don't know how to do that.

my code to get this data frame is like.

 new_list = []
    for k1 in remove_empties_from_dict(combined_dict):
     curr_dict = remove_empties_from_dict(combined_dict)[k1]
         for k2 in curr_dict:
              curr_dict_2=curr_dict[k2]
                for k3 in curr_dict_2:
                    curr_dict_3=curr_dict_2[k3]
                       for k4 in curr_dict_3:
                            curr_dict_4=curr_dict_3[k4]
                                new_dict= {'c1': k1, 'c2': k2, 'c3': k3, 'c4': k4,'c5': curr_dict_4}
new_list.append(new_dict)
df = pd.DataFrame(new_list)

Rohan Pillai · Accepted Answer

I couldn't find a direct function that does the merging of cells with similar values so instead, I have written a code that does that.

print(df)

  column1 column2 column3 column4 column5
0   r_n_1   r_s_1   r_n_2   r_s_3   r_n_3
1   r_n_1   r_s_1   r_n_4   r_s_4   r_n_5
2   r_n_1   r_s_1   r_n_6   r_s_5   r_n_7
3   r_n_1   r_s_1   r_n_6   r_s_6   r_n_9
4  r_n_10   r_s_7  r_n_11   r_s_8  r_n_12
5  r_n_10   r_s_9  r_n_11  r_s_10  r_n_13

This is the df I have to work with. But in order to do that what I did was, I iterated it once to check which values are similar, and replaced the with a -. The reason I did not make it NoneType was because the cells below the table have a NoneType value, so the further part of the code will keep on iterating infinitely. What I did was:

for i in df.columns:
    for j in range(len(df[i])):

        for k in range(j+1,len(df[i])):
            if df[i][j]== df[i][k]:                
                df[i][k]='-'

So now my df looks something like:

print(df)

  column1 column2 column3 column4 column5
0   r_n_1   r_s_1   r_n_2   r_s_3   r_n_3
1       -       -   r_n_4   r_s_4   r_n_5
2       -       -   r_n_6   r_s_5   r_n_7
3       -       -       -   r_s_6   r_n_9
4  r_n_10   r_s_7  r_n_11   r_s_8  r_n_12
5       -   r_s_9       -  r_s_10  r_n_13

Now that I have all unique values in the Data Frame, I will check whether the df element is a valid input or a -. And the cells that are - will be merged with its upper value. I did that by:

from openpyxl.workbook import Workbook    
exportPath = r'C:\Users\T01144\Desktop\PythonExport.xlsx'

wb= Workbook()
ws=wb.active
rowInd=1
colInd=1
colList=['-', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H','I'] # Continue if there are more columns  

for i in df.columns:
    for j in range(0,len(df[i])):
        if(df[i][j]!='-'):
            ws.cell(row=rowInd,column=colInd,value=df[i][j])            
        else:
            count=0
            for l in range(j+1,len(df[i])):
                count+=1
                if df[i][l]!='-':
                    count-=1
                    break
            ws.merge_cells(str(str(colList[colInd]+str(rowInd-1))+":"+str(colList[colInd]+str(rowInd+count))))
        rowInd+=1

    colInd+=1
    rowInd=1

The output I have now is:

The entire code can be found here.

NOTE: Some of you may get this error after creating the Excel:

We found a problem with some content in 'PythonExport.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.

Just ignore this error and click Yes.

Merging values in dataframe to write in excel

Answers (1)

Related Questions