Reputation: 2126
Basically I am trying to do the opposite of How to generate a list from a pandas DataFrame with the column name and column values?
To borrow that example, I want to go from the form:
data = [
['Name','Rank','Complete'],
['one', 1, 1],
['two', 2, 1],
['three', 3, 1],
['four', 4, 1],
['five', 5, 1]
]
which should output:
Rank Complete
Name
One 1 1
Two 2 1
Three 3 1
Four 4 1
Five 5 1
However when I do something like:
pd.DataFrame(data)
I get a dataframe where the first list should be my column labels, and then the first element of each list should be the indices.
Upvotes: 27
Views: 72304
Reputation: 1
Convert nested list to pandas dataframe:
import pandas as pd
# Sample data (replace with your `Final_data` if obtained from scraping)
data = [[['1', 'Walmart', 'https://www.walmart.com/'], ['2', 'Amazon', 'https://www.amazon.com/'], ['3', 'Exxon Mobil', 'https://corporate.exxonmobil.com/'], ['4', 'Apple', 'https://www.apple.com/'], ['5', 'UnitedHealth Group', 'https://www.unitedhealthgroup.com/'], ['6', 'CVS Health', 'https://www.cvshealth.com/'], ['7', 'Berkshire Hathaway', 'https://www.berkshirehathaway.com/'], ['8', 'Alphabet', 'https://abc.xyz/'], ['9', 'McKesson', 'https://www.mckesson.com/'], ['10', 'Chevron', 'https://www.chevron.com/']], [['11', 'AmerisourceBergen', 'https://www.amerisourcebergen.com/'], ['12', 'Costco Wholesale', 'https://www.costco.com/'], ['13', 'Microsoft', 'https://www.microsoft.com/'], ['14', 'Cardinal Health', 'https://www.cardinalhealth.com/'], ['15', 'Cigna', 'https://www.cigna.com/'], ['16', 'Marathon Petroleum', 'https://www.marathonpetroleum.com/'], ['17', 'Phillips 66', 'https://www.phillips66.com/'], ['18', 'Valero Energy', 'https://www.valero.com/'], ['19', 'Ford Motor', 'https://www.ford.com/'], ['20', 'Home Depot', 'https://www.homedepot.com/']]]
# Create a DataFrame from the list, flattening each sublist into rows
df = pd.DataFrame([item for sublist in data for item in sublist])
# Rename columns (assuming the first element in each sublist is the S.No)
df.columns = ['S. No', 'Name', 'URL']
print(df)
Upvotes: 0
Reputation: 23449
To create the desired dataframe from construction, the list could be converted into a numpy array and indexed accordingly.
arr = np.array(data, dtype=object)
df = pd.DataFrame(arr[1:, 1:], index=pd.Index(arr[1:, 0], name=arr[0,0]), columns=arr[0, 1:], dtype=int)
Another method is, since the data looks like a csv file read into a Python list, it could be converted into an in-memory text buffer and have pd.read_csv
called on it. A nice thing about read_csv
is that it can set MultiIndex columns, indices etc. and can infer dtypes.
from io import StringIO
df = pd.read_csv(StringIO('\n'.join(['|'.join(map(str, row)) for row in data])), sep='|', index_col=[0])
A convenience function for the latter method:
from io import StringIO
def read_list(data, index_col=None, header=0):
sio = StringIO('\n'.join(['|'.join(map(str, row)) for row in data]))
return pd.read_csv(sio, sep='|', index_col=index_col, header=header)
df = read_list(data, index_col=[0])
Upvotes: 1
Reputation: 91009
One way to do this would be to take the column names as a separate list and then only give from 1st index for pd.DataFrame
-
In [8]: data = [['Name','Rank','Complete'],
...: ['one', 1, 1],
...: ['two', 2, 1],
...: ['three', 3, 1],
...: ['four', 4, 1],
...: ['five', 5, 1]]
In [10]: df = pd.DataFrame(data[1:],columns=data[0])
In [11]: df
Out[11]:
Name Rank Complete
0 one 1 1
1 two 2 1
2 three 3 1
3 four 4 1
4 five 5 1
If you want to set the first column Name
column as index, use the .set_index()
method and send in the column to use for index. Example -
In [16]: df = pd.DataFrame(data[1:],columns=data[0]).set_index('Name')
In [17]: df
Out[17]:
Rank Complete
Name
one 1 1
two 2 1
three 3 1
four 4 1
five 5 1
Upvotes: 54