Elizabeth Susan Joseph
Elizabeth Susan Joseph

Reputation: 6515

how to convert a list into a pandas dataframe

I have the following code:

rows =[]
for dt in new_info:
    x =  dt['state']
    est = dt['estimates']

    col_R = [val['choice'] for val in est if val['party'] == 'Rep']
    col_D = [val['choice'] for val in est if val['party'] == 'Dem']

    incumb = [val['party'] for val in est if val['incumbent'] == True ]

    rows.append((x, col_R, col_D, incumb))

Now I want to convert my rows list into a pandas data frame. Structure of my rows list is shown below and my list has 32 entries.

enter image description here

When I convert this into a pandas data frame, I get the entries in the data frame as a list. :

pd.DataFrame(rows, columns=["State", "R", "D", "incumbent"])  

enter image description here

But I want my data frame like this

enter image description here

The new info variable looks like this enter image description here

Upvotes: 14

Views: 90863

Answers (2)

alacy
alacy

Reputation: 5074

You can use some built in python list manipulation and do something like:

df['col1'] = df['col1'].apply(lambda i: ''.join(i))

which will produce:

    col1 col2
0    a  [d]
1    b  [e]
2    c  [f]

Obviously col2 hasn't been formatted in order to show contrast.

Edit

As requested by OP, if you want to implement an apply(lambda...) to all the columns then you can either explicitly set each column with a line that looks like the one above replacing 'col1' with each of the column names you wish to alter or you can just loop over the columns like this:

if you have a data frame of type

x = [['a'],['b'],['c'],['d']]
y = [['e'],['f'],['g'],['h']]
z = [['i'],['j'],['k'],['l']]

df = pd.DataFrame({'col1':x, 'col2':y, 'col3':z})

then you can loop over the columns

for col in df.columns:
    df[col] = df[col].apply(lambda i: ''.join(i))

which converts a data frame that starts like:

   col1 col2 col3
0  [a]  [e]  [i]
1  [b]  [f]  [j]
2  [c]  [g]  [k]
3  [d]  [h]  [l]

and becomes

    col1 col2 col3
0    a    e    i
1    b    f    j
2    c    g    k
3    d    h    l

Upvotes: 7

Aaron Hall
Aaron Hall

Reputation: 395005

Since you mind the objects in the columns being lists, I would use a generator to remove the lists wrapping your items:

import pandas as pd
import numpy as np
rows = [(u'KY', [u'McConnell'], [u'Grimes'], [u'Rep']),
        (u'AR', [u'Cotton'], [u'Pryor'], [u'Dem']),
        (u'MI', [u'Land'], [u'Peters'], [])]

def get(r, nth):
    '''helper function to retrieve item from nth list in row r'''
    return r[nth][0] if r[nth] else np.nan

def remove_list_items(list_of_records):
    for r in list_of_records:
        yield r[0], get(r, 1), get(r, 2), get(r, 3)

The generator works similarly to this function, but instead of materializing a list unnecessarily in memory as an intermediate step, it just passes each row that would be in the list to the consumer of the list of rows:

def remove_list_items(list_of_records):
    result = []
    for r in list_of_records:
        result.append((r[0], get(r, 1), get(r, 2), get(r, 3)))
    return result

And then compose your DataFrame passing your data through the generator, (or the list version, if you wish.)

>>> df = pd.DataFrame.from_records(
        remove_list_items(rows), 
        columns=["State", "R", "D", "incumbent"])
>>> df
  State          R       D incumbent
0    KY  McConnell  Grimes       Rep
1    AR     Cotton   Pryor       Dem
2    MI       Land  Peters       NaN

Or you could use a list comprehension or a generator expression (shown) to do essentially the same:

>>> df = pd.DataFrame.from_records(
      ((r[0], get(r, 1), get(r, 2), get(r, 3)) for r in rows), 
      columns=["State", "R", "D", "incumbent"])

Upvotes: 9

Related Questions