Reputation: 15466
I have a DataFrame
that looks like this:
id t_l
0 100 [('a', 1), ('b', 2)]
1 151 [('x', 4), ('y', 3)]
I need to return a DataFrame
that looks like this:
id f g
0 100 'a' 1
1 'b' 2
2 151 'x' 4
3 'y' 3
What is the best approach to this?
Upvotes: 1
Views: 681
Reputation: 340
new = pd.DataFrame(columns=['id','f','g'])
for ind, row in df.iterrows():
x, y = row
for i in range(len(y)):
(a,b) = y[i]
if i==0:
new.loc[str(ind)] = x,a,b
else:
new.loc[str(ind)+str(i)] = '',a,b
This will unpack the t_1 elements systematically and output the dataframe in a structure you want.
Output:
id f g
0 100 a 1
01 b 2
1 151 x 4
11 y 3
Caution: It is a simple approach, however can be significantly more costly if used for large amounts of data.
Upvotes: 1
Reputation: 862581
Use nested list comprehension with zip
:
zipped = zip(df['id'], df['t_l'])
df = pd.DataFrame([(i, y1, y2) for i, x in zipped for y1, y2 in x], columns=['id','f','g'])
print (df)
id f g
0 100 a 1
1 100 b 2
2 151 x 4
3 151 y 3
If need MultiIndex Series:
zipped = zip(df['id'], df['t_l'])
s = pd.Series({(i, y1):y2 for i, x in zipped for y1, y2 in x})
print (s)
100 a 1
b 2
151 x 4
y 3
dtype: int64
Upvotes: 1
Reputation:
Use apply()
AND pd.multiindex
:-
df = pd.DataFrame(data = [ [100, [('a', 1), ('b', 2)] ] , [151,[('x', 4), ('y', 3)]]
], columns=['id', 't_1'] )
inside, outside, value = [], [], []
c = 0
def get_outside(x):
global c
length = len( df['t_1'][c] )
c+=1
for var in range(length):
outside.append(x)
def get_inside_value(y):
for var in y:
inside.append(var[0])
value.append(var[1])
df['id'].apply(get_outside)
df['t_1'].apply(get_inside_value)
hier_index = list(zip(outside,inside))
hier_index = pd.MultiIndex.from_tuples(hier_index)
new_df = pd.DataFrame(value ,index=hier_index,columns=['g',])
new_df.index.names = ['id','f']
new_df
Output
g
id f
100 a 1
b 2
151 x 4
y 3
Upvotes: 1
Reputation: 25239
Edit:
@ALollz makes a good point about speed of np.concatenate
vs. chain.from_iterable(df.t_l)
. I %timeit
and it is true. Therefore, I added solution using from_iterable(df.t_l)
from itertools import chain
pd.DataFrame(chain.from_iterable(df.t_l), index=np.repeat(df.id, df.t_l.str.len()), \
columns=['f', 'g']).reset_index()
Original:
I would construct a new df
using np.concatenate
for data and np.repeat
for index. Finally, reset_index
to put id
back to column
pd.DataFrame(np.concatenate(df.t_l), index=np.repeat(df.id, df.t_l.str.len()), \
columns=['f', 'g']).reset_index()
Out[596]:
id f g
0 100 a 1
1 100 b 2
2 151 x 4
3 151 y 3
Upvotes: 2
Reputation: 3770
df.set_index('id').apply(lambda x: pd.Series([list(i) for i in list(x.t_l)]), axis=1).stack().apply(pd.Series).rename({0:'f',1:'g'}, axis=1).reset_index(level=1, drop=True)
OR.. removing the pd.Series
approach for efficiency
pd.DataFrame(df.set_index('id').apply(lambda x: pd.Series([list(i) for i in list(x.t_l)]), axis=1).stack()).rename({0:'f',1:'g'}, axis=1).reset_index(level=1, drop=True)
Output
f g
id
100 a 1
100 b 2
151 x 4
151 y 3
Upvotes: 1