gc5
gc5

Reputation: 9898

IPython + pandas oneliner not working

I am trying to do a oneliner on IPython but I get SyntaxError: invalid syntax. The code is the following:

 for zzz in ddd.index: zzz1 = zzz.split('///'); zzz3 = [zzz2.strip() for zzz2 in zzz1 if len(zzz1) > 1]; for zzz4 in zzz3: ddd.ix[zzz4]['Class'] = ddd.ix[zzz]['Class']; del ddd.ix[zzz]

I can explain it as: For each value on the index of DataFrame ddd I split it using /// as a separator. Then, if there are multiple values returned, I create a row for each value and remove the original row. In example I have:

             Class
lal          1
eri /// iii  2
aks          3

I want to obtain

             Class
lal          1
eri          2
iii          2
aks          3

The first column (`lal', 'eri', ... ) is the index of dataframe.

How can I achieve this? I have searched through the documentation but I did not manage out how to do it.

Thanks

Upvotes: 0

Views: 123

Answers (2)

DSM
DSM

Reputation: 353419

Here's a version at the opposite end of the spectrum from @Jeff's: horribly slow, but pretty clear, I think.

index_pairs = [(ind, subind.strip()) for ind in df.index for subind in ind.split("///")]
old_i, new_i = zip(*index_pairs)
df2 = df.ix[list(old_i)]
df2.index = new_i

Note that this assumes the original indices are unique.


Start with our frame:

>>> df
             Class
lal              1
eri /// iii      2
aks              3

Make a list of pairs connecting the original index with as many new subindices as needed:

>>> index_pairs = [(ind, subind.strip()) for ind in df.index for subind in ind.split("///")]
>>> index_pairs
[('lal', 'lal'), ('eri /// iii', 'eri'), ('eri /// iii', 'iii'), ('aks', 'aks')]

Transpose:

>>> old_i, new_i = zip(*index_pairs)
>>> old_i
('lal', 'eri /// iii', 'eri /// iii', 'aks')
>>> new_i
('lal', 'eri', 'iii', 'aks')

Use the old indices to index into df:

>>> df2 = df.ix[list(old_i)]
>>> df2
             Class
lal              1
eri /// iii      2
eri /// iii      2
aks              3

Reset the indices:

>>> df2.index = new_i
>>> df2
     Class
lal      1
eri      2
iii      2
aks      3

Upvotes: 1

Jeff
Jeff

Reputation: 129018

Not sure what you are trying to do here.

In [13]: df
Out[13]: 
             A  B
0          lal  1
1  eri /// iii  2
2          aks  3

Here is a horribly long expression to do this. Good news is that this will be pretty fast.

In [56]: split = df['A'].str.split('\s+\/\/\/\s+').apply(Series)

In [57]: split
Out[57]: 
     0    1
0  lal  NaN
1  eri  iii
2  aks  NaN

In [58]: indexed = split.unstack().dropna()

In [59]: indexed
Out[59]: 
0  0    lal
   1    eri
   2    aks
1  1    iii
dtype: object

 In [61]: grouped = indexed.groupby(level=1).apply(
           lambda x: Series(x.values,index=list(x.index.get_level_values(1))))

In [62]: grouped
Out[62]: 
0  0    lal
1  1    eri
   1    iii
2  2    aks
dtype: object

In [63]: grouped.reset_index().set_index('level_1')
Out[63]: 
         level_0    0
level_1              
0              0  lal
1              1  eri
1              1  iii
2              2  aks

Upvotes: 2

Related Questions