Reputation: 11
i'm trying to extract 'Country' column data into python list using pandas. Below the code i used to. Also attached excel sheet and output.
code:
from pandas import DataFrame
import pandas as pd
open_file = pd.read_excel('data.xlsx', sheet_name=0)
df = list(open_file['Country'])
print(df)
Output:
[nan, 'Great Britain', 'China ', 'Russia', 'United States', 'Korea', 'Japan', 'Germany']
Process finished with exit code 0
In the output i can see 'nan' because in the sheet two cells are merged into one. How to avoid this?
Upvotes: 1
Views: 2871
Reputation: 1068
Try this
df = pd.read_excel('data.xlsx', header[0,1])
df = df.rename(columns=lambda x: x if not 'Unnamed' in str(x) else '')
Now the headers are in the form of tuples. For ex, to access Country
or Column Gold
, you need to write something like below statements
print(df[('Country', '')])
print(df[('Media Tally', 'Gold')])
Upvotes: 1
Reputation: 192
Use header=1 and then you can use it with unnamed :0 or 1 or 2 to get column values to list
import pandas as pd
df = pd.read_excel('data.xlsx', sheet_name=0, header=1)
print(df['Unnamed: 0'].to_list())
Upvotes: 0