Reputation: 9898
I am trying to do a oneliner on IPython but I get SyntaxError: invalid syntax
. The code is the following:
for zzz in ddd.index: zzz1 = zzz.split('///'); zzz3 = [zzz2.strip() for zzz2 in zzz1 if len(zzz1) > 1]; for zzz4 in zzz3: ddd.ix[zzz4]['Class'] = ddd.ix[zzz]['Class']; del ddd.ix[zzz]
I can explain it as:
For each value on the index of DataFrame ddd
I split it using ///
as a separator. Then, if there are multiple values returned, I create a row for each value and remove the original row.
In example I have:
Class
lal 1
eri /// iii 2
aks 3
I want to obtain
Class
lal 1
eri 2
iii 2
aks 3
The first column (`lal', 'eri', ... ) is the index of dataframe.
How can I achieve this? I have searched through the documentation but I did not manage out how to do it.
Thanks
Upvotes: 0
Views: 123
Reputation: 353419
Here's a version at the opposite end of the spectrum from @Jeff's: horribly slow, but pretty clear, I think.
index_pairs = [(ind, subind.strip()) for ind in df.index for subind in ind.split("///")]
old_i, new_i = zip(*index_pairs)
df2 = df.ix[list(old_i)]
df2.index = new_i
Note that this assumes the original indices are unique.
Start with our frame:
>>> df
Class
lal 1
eri /// iii 2
aks 3
Make a list of pairs connecting the original index with as many new subindices as needed:
>>> index_pairs = [(ind, subind.strip()) for ind in df.index for subind in ind.split("///")]
>>> index_pairs
[('lal', 'lal'), ('eri /// iii', 'eri'), ('eri /// iii', 'iii'), ('aks', 'aks')]
Transpose:
>>> old_i, new_i = zip(*index_pairs)
>>> old_i
('lal', 'eri /// iii', 'eri /// iii', 'aks')
>>> new_i
('lal', 'eri', 'iii', 'aks')
Use the old indices to index into df
:
>>> df2 = df.ix[list(old_i)]
>>> df2
Class
lal 1
eri /// iii 2
eri /// iii 2
aks 3
Reset the indices:
>>> df2.index = new_i
>>> df2
Class
lal 1
eri 2
iii 2
aks 3
Upvotes: 1
Reputation: 129018
Not sure what you are trying to do here.
In [13]: df
Out[13]:
A B
0 lal 1
1 eri /// iii 2
2 aks 3
Here is a horribly long expression to do this. Good news is that this will be pretty fast.
In [56]: split = df['A'].str.split('\s+\/\/\/\s+').apply(Series)
In [57]: split
Out[57]:
0 1
0 lal NaN
1 eri iii
2 aks NaN
In [58]: indexed = split.unstack().dropna()
In [59]: indexed
Out[59]:
0 0 lal
1 eri
2 aks
1 1 iii
dtype: object
In [61]: grouped = indexed.groupby(level=1).apply(
lambda x: Series(x.values,index=list(x.index.get_level_values(1))))
In [62]: grouped
Out[62]:
0 0 lal
1 1 eri
1 iii
2 2 aks
dtype: object
In [63]: grouped.reset_index().set_index('level_1')
Out[63]:
level_0 0
level_1
0 0 lal
1 1 eri
1 1 iii
2 2 aks
Upvotes: 2