liule123
liule123

Reputation: 23

How to use Python to convert rows to columns in an excel (.xls) file that require the extraction of keywords?

Let's say I have an excel file as below:

enter image description here

If you are not able to view the image link, the excel file is basically like this:

Name       XXX
Age        XXX
Gender     XXX
Job        XXX
Location   XXX
...        ...
...        ...

(Repeat the list of "Name, Age, Gender, Job, Location".) There are at least 1000 or more lists in the file. "XXX" represent the content correspond to each field of the list.

I searched online about row-to-column conversion, and I found the paste function called 'transpose' within the excel. However, it is very simple as it converts all the rows into columns.

What I want to do is to have only 5 columns, with column names of Name, Age, Gender, Job and Location.

Upvotes: 2

Views: 5798

Answers (1)

Jay Shankar Gupta
Jay Shankar Gupta

Reputation: 6098

import numpy as np
import pandas as pd
import openpyxl
dfs = pd.read_excel('input.xlsx', sheet_name=None,header=None)
tester=dfs['Sheet1'].values.tolist()
keys = list(zip(*tester))[0]
seen = set()
seen_add = seen.add
keysu= [x for x in keys if not (x in seen or seen_add(x))]
values = list(zip(*tester))[1]
a = np.array(values).reshape(int(len(values)/len(keysu)),len(keysu))
list1=[keysu]
for i in a:
    list1.append(list(i))
df=pd.DataFrame(list1)
df.to_excel('output.xlsx',index=False,header=False)

input.xlsx

enter image description here

Output.xlsx

enter image description here

Upvotes: 1

Related Questions