user1082164
user1082164

Reputation: 55

How to using pandas to cell merge

I want to combine the cells as follows.

Before:

|   | test1 | test2 | test3 |
| -:|:----- |:----- | -----:|
| 0 | value | value | value |
| 1 | test4 | test5 |
| 2 | value | value |
| 3 | test6 | test7 | test8 |
| 4 | value | value | value |
| 5 | test9 | test0 |
| 6 | value | value |

After:

|   | test1 | test2 | test3 | test4 | test5 | test6 | test7 | test8 | test9 | test0 |
| -:|:----- |:----- | ----- |:----- |:----- |:----- |:----- |:----- |:----- | -----:|
| 0 | value | value | value | value | value | value | value | value | value | value |

I want to change the cells using the Python code with the help of pandas. Please help me with this. Thank you.

Upvotes: 3

Views: 143

Answers (2)

Andy L.
Andy L.

Reputation: 25239

Slice from row 1 and pre-process it with melt to series s. Slice even rows of s for columns and odd rows for values and construct a new dataframe from these columns and values. Finally, join to row 0 of original df

Sample df:
    test1   test2   test3
0  value1  value2  value3
1   test4   test5     NaN
2  value4  value5     NaN
3   test6   test7   test8
4  value6  value7  value8
5   test9   test0     NaN
6  value9  value0     NaN

s = df.iloc[1:].melt().dropna()['value']
cols = s[0::2].tolist()
vals = s[1::2].tolist()
df_final = df.iloc[[0]].join(pd.DataFrame([vals], columns=cols)).sort_index(1)

Out[140]:
    test0   test1   test2   test3   test4   test5   test6   test7   test8  \
0  value0  value1  value2  value3  value4  value5  value6  value7  value8

    test9
0  value9

Upvotes: 2

SSharma
SSharma

Reputation: 953

Here is something you can do.

import pandas as pd

df = pd.DataFrame({'test1 ':['15','test4','79', 'test6', '34', 'test9', '323'],
                   'test2 ':['78','test5','45', 'test7', '4', 'test10', '34'],
                   'test3 ':['8','','', 'test8', '56', '', '']})
print("Original Dataframe")
print(df)

df1 = pd.DataFrame()
col_names = []
col_names = df.iloc[1::2, :].to_numpy('str').tolist()
row_values = df.iloc[2::2, :].to_numpy('str').tolist()
col_names = [j for sub in col_names for j in sub if j!= '']
row_values = [j for sub in row_values for j in sub if j!= '']
df1 = pd.DataFrame([row_values], columns= col_names)
print("Dataframe 1")
print(df1)

df2 = df.iloc[[0, ]]
print("Dataframe 2")
print(df2)

df3 = pd.concat([df2, df1], axis=1)
print("Dataframe Result")
print(df3)

The df1 contains all the data except for the first one. The df2 contains only original columns and first row. Finally, you concatenate df1 and df2 to form df3.
This gives you an output as:

Original Dataframe
  test1   test2  test3 
0     15      78      8
1  test4   test5       
2     79      45       
3  test6   test7  test8
4     34       4     56
5  test9  test10       
6    323      34       
Dataframe 1
  test4 test5 test6 test7 test8 test9 test10
0    79    45    34     4    56   323     34
Dataframe 2
  test1  test2  test3 
0     15     78      8
Dataframe Result
  test1  test2  test3  test4 test5 test6 test7 test8 test9 test10
0     15     78      8    79    45    34     4    56   323     34

Upvotes: 2

Related Questions