Reputation: 6515
I have the following code:
rows =[]
for dt in new_info:
x = dt['state']
est = dt['estimates']
col_R = [val['choice'] for val in est if val['party'] == 'Rep']
col_D = [val['choice'] for val in est if val['party'] == 'Dem']
incumb = [val['party'] for val in est if val['incumbent'] == True ]
rows.append((x, col_R, col_D, incumb))
Now I want to convert my rows list into a pandas data frame. Structure of my rows list is shown below and my list has 32 entries.
When I convert this into a pandas data frame, I get the entries in the data frame as a list. :
pd.DataFrame(rows, columns=["State", "R", "D", "incumbent"])
But I want my data frame like this
The new info variable looks like this
Upvotes: 14
Views: 90863
Reputation: 5074
You can use some built in python list manipulation and do something like:
df['col1'] = df['col1'].apply(lambda i: ''.join(i))
which will produce:
col1 col2
0 a [d]
1 b [e]
2 c [f]
Obviously col2
hasn't been formatted in order to show contrast.
As requested by OP, if you want to implement an apply(lambda...)
to all the columns then you can either explicitly set each column with a line that looks like the one above replacing 'col1'
with each of the column names you wish to alter or you can just loop over the columns like this:
if you have a data frame of type
x = [['a'],['b'],['c'],['d']]
y = [['e'],['f'],['g'],['h']]
z = [['i'],['j'],['k'],['l']]
df = pd.DataFrame({'col1':x, 'col2':y, 'col3':z})
then you can loop over the columns
for col in df.columns:
df[col] = df[col].apply(lambda i: ''.join(i))
which converts a data frame that starts like:
col1 col2 col3
0 [a] [e] [i]
1 [b] [f] [j]
2 [c] [g] [k]
3 [d] [h] [l]
and becomes
col1 col2 col3
0 a e i
1 b f j
2 c g k
3 d h l
Upvotes: 7
Reputation: 395005
Since you mind the objects in the columns being lists, I would use a generator to remove the lists wrapping your items:
import pandas as pd
import numpy as np
rows = [(u'KY', [u'McConnell'], [u'Grimes'], [u'Rep']),
(u'AR', [u'Cotton'], [u'Pryor'], [u'Dem']),
(u'MI', [u'Land'], [u'Peters'], [])]
def get(r, nth):
'''helper function to retrieve item from nth list in row r'''
return r[nth][0] if r[nth] else np.nan
def remove_list_items(list_of_records):
for r in list_of_records:
yield r[0], get(r, 1), get(r, 2), get(r, 3)
The generator works similarly to this function, but instead of materializing a list unnecessarily in memory as an intermediate step, it just passes each row that would be in the list to the consumer of the list of rows:
def remove_list_items(list_of_records):
result = []
for r in list_of_records:
result.append((r[0], get(r, 1), get(r, 2), get(r, 3)))
return result
And then compose your DataFrame passing your data through the generator, (or the list version, if you wish.)
>>> df = pd.DataFrame.from_records(
remove_list_items(rows),
columns=["State", "R", "D", "incumbent"])
>>> df
State R D incumbent
0 KY McConnell Grimes Rep
1 AR Cotton Pryor Dem
2 MI Land Peters NaN
Or you could use a list comprehension or a generator expression (shown) to do essentially the same:
>>> df = pd.DataFrame.from_records(
((r[0], get(r, 1), get(r, 2), get(r, 3)) for r in rows),
columns=["State", "R", "D", "incumbent"])
Upvotes: 9