Reputation: 3031
if I have the following, how do I make pd.DataFrame() turn this array into a dataframe with two columns. What's the most efficient way? My current approach involves creating copies out of each into a series and making dataframes out of them.
From this:
([[u'294 (24%) L', u'294 (26%) R'],
[u'981 (71%) L', u'981 (82%) R'],])
to
x y
294 294
981 981
rather than
x
[u'294 (24%) L', u'294 (26%) R']
my current approach. Looking for something more efficient
numL = pd.Series(numlist).map(lambda x: x[0])
numR = pd.Series(numlist).map(lambda x: x[1])
nL = pd.DataFrame(numL, columns=['left_num'])
nR = pd.DataFrame(numR, columns=['right_num'])
nLR = nL.join(nR)
nLR
UPDATE**
I noticed that my error simply comes down to when you pd.DataFrame() a list versus a series. WHen you create a dataframe out of a list, it merges the items into the same column. Not so with a list. That solved my problem in the most efficient way.
Upvotes: 14
Views: 44430
Reputation: 879271
data = [[u'294 (24%) L', u'294 (26%) R'], [u'981 (71%) L', u'981 (82%) R'],]
clean_data = [[int(item.split()[0]) for item in row] for row in data]
# clean_data: [[294, 294], [981, 981]]
pd.DataFrame(clean_data, columns=list('xy'))
# x y
# 0 294 294
# 1 981 981
#
# [2 rows x 2 columns]
Upvotes: 16