qwertylpc
qwertylpc

Reputation: 2126

Nested List to Pandas Dataframe with headers

Basically I am trying to do the opposite of How to generate a list from a pandas DataFrame with the column name and column values?

To borrow that example, I want to go from the form:

data = [
    ['Name','Rank','Complete'],
    ['one', 1, 1],
    ['two', 2, 1],
    ['three', 3, 1],
    ['four', 4, 1],
    ['five', 5, 1]
]

which should output:

      Rank Complete
 Name
  One    1        1
  Two    2        1
Three    3        1
 Four    4        1
 Five    5        1

However when I do something like:

pd.DataFrame(data)

I get a dataframe where the first list should be my column labels, and then the first element of each list should be the indices.

Upvotes: 27

Views: 72304

Answers (3)

Thangarajtest
Thangarajtest

Reputation: 1

Convert nested list to pandas dataframe:

import pandas as pd

# Sample data (replace with your `Final_data` if obtained from scraping)
data = [[['1', 'Walmart', 'https://www.walmart.com/'], ['2', 'Amazon', 'https://www.amazon.com/'], ['3', 'Exxon Mobil', 'https://corporate.exxonmobil.com/'], ['4', 'Apple', 'https://www.apple.com/'], ['5', 'UnitedHealth Group', 'https://www.unitedhealthgroup.com/'], ['6', 'CVS Health', 'https://www.cvshealth.com/'], ['7', 'Berkshire Hathaway', 'https://www.berkshirehathaway.com/'], ['8', 'Alphabet', 'https://abc.xyz/'], ['9', 'McKesson', 'https://www.mckesson.com/'], ['10', 'Chevron', 'https://www.chevron.com/']], [['11', 'AmerisourceBergen', 'https://www.amerisourcebergen.com/'], ['12', 'Costco Wholesale', 'https://www.costco.com/'], ['13', 'Microsoft', 'https://www.microsoft.com/'], ['14', 'Cardinal Health', 'https://www.cardinalhealth.com/'], ['15', 'Cigna', 'https://www.cigna.com/'], ['16', 'Marathon Petroleum', 'https://www.marathonpetroleum.com/'], ['17', 'Phillips 66', 'https://www.phillips66.com/'], ['18', 'Valero Energy', 'https://www.valero.com/'], ['19', 'Ford Motor', 'https://www.ford.com/'], ['20', 'Home Depot', 'https://www.homedepot.com/']]]

# Create a DataFrame from the list, flattening each sublist into rows
df = pd.DataFrame([item for sublist in data for item in sublist])

# Rename columns (assuming the first element in each sublist is the S.No)
df.columns = ['S. No', 'Name', 'URL']

print(df)

Upvotes: 0

cottontail
cottontail

Reputation: 23449

To create the desired dataframe from construction, the list could be converted into a numpy array and indexed accordingly.

arr = np.array(data, dtype=object)
df = pd.DataFrame(arr[1:, 1:], index=pd.Index(arr[1:, 0], name=arr[0,0]), columns=arr[0, 1:], dtype=int)

Another method is, since the data looks like a csv file read into a Python list, it could be converted into an in-memory text buffer and have pd.read_csv called on it. A nice thing about read_csv is that it can set MultiIndex columns, indices etc. and can infer dtypes.

from io import StringIO
df = pd.read_csv(StringIO('\n'.join(['|'.join(map(str, row)) for row in data])), sep='|', index_col=[0])

res


A convenience function for the latter method:

from io import StringIO
def read_list(data, index_col=None, header=0):
    sio = StringIO('\n'.join(['|'.join(map(str, row)) for row in data]))
    return pd.read_csv(sio, sep='|', index_col=index_col, header=header)

df = read_list(data, index_col=[0])

Upvotes: 1

Anand S Kumar
Anand S Kumar

Reputation: 91009

One way to do this would be to take the column names as a separate list and then only give from 1st index for pd.DataFrame -

In [8]: data = [['Name','Rank','Complete'],
   ...:                ['one', 1, 1],
   ...:                ['two', 2, 1],
   ...:                ['three', 3, 1],
   ...:                ['four', 4, 1],
   ...:                ['five', 5, 1]]

In [10]: df = pd.DataFrame(data[1:],columns=data[0])

In [11]: df
Out[11]:
    Name  Rank  Complete
0    one     1         1
1    two     2         1
2  three     3         1
3   four     4         1
4   five     5         1

If you want to set the first column Name column as index, use the .set_index() method and send in the column to use for index. Example -

In [16]: df = pd.DataFrame(data[1:],columns=data[0]).set_index('Name')

In [17]: df
Out[17]:
       Rank  Complete
Name
one       1         1
two       2         1
three     3         1
four      4         1
five      5         1

Upvotes: 54

Related Questions