ljcq 09
ljcq 09

Reputation: 135

Pandas Reshape DataFrame in a loop

I am new to Pandas in python. I have a dataframe with 2 keys 15 rows each and 1 column like below

            1
key1/1     0.5
key1/2     0.5
key1/3     0
key1/4     0
key1/5     0.6
key1/6     0.7
key1/7     0
key1/8     0
key1/9     0
key1/10    0.5
key1/11    0.5
key1/12    0.5
key1/13    0
key1/14    0.5
key1/15    0.5

key2/1     0.4
key2/2     0.2
key2/3     0
key2/4     0
key2/5     0.1
key2/6     0.2
key2/7     0
key2/8     0
key2/9     0.3
key2/10    0.2
key2/11    0
key2/12    0.5
key2/13    0
key2/14    0
key2/15    0.5

I want to iterate the rows of the dataframe so each time it meets a 'zero' it creates a new column like below

            1     2     3     4 
key1/1     0.5   0.6   0.5   0.5
key1/2     0.5   0.7   0.5   0.5
key1/3     nan   nan   0.5   nan
key1/4     nan   nan   nan   nan

            1     2     3     4     5
key2/1     0.4   0.1   0.3   0.5   0.5
key2/2     0.2   0.2   0.2   nan   nan
key2/3     nan   nan   nan   nan   nan
key2/4     nan   nan   nan   nan   nan

I have tried the following code trying to iterate 'key1' only

df2=pd.Dataframe[]
for row in df['key1'].index:
    new_df['keyl'][row] == df['keyl'][row]
    if df['keyl'][row] == 0:
         new_df['key1'].append(df2,ignore_index=True)

Obviously it is not working, please send some help. Ideally I would like to modify the same dataframe instead of creating a new one. Thanks

EDIT Below is a drawing of how my data looks like enter image description here

And below is what I am trying to achieve enter image description here

Upvotes: 1

Views: 179

Answers (1)

Pygirl
Pygirl

Reputation: 13349

You can use mask them by zero and assign a key. Based on the key you can group them and transform them to columns.

All credit goes to this answer. You will find a great explanation there.

df2 = df.mask((df['1'] == 0) )
df2['group'] = (df2['1'].shift(1).isnull() & df2['1'].notnull()).cumsum()
df2 = df2.dropna()
df2.pivot(columns='group')

1
group   1   2   3   4
key1/1  0.5 NaN NaN NaN
key1/10 NaN NaN 0.5 NaN
key1/11 NaN NaN 0.5 NaN
key1/12 NaN NaN 0.5 NaN
key1/14 NaN NaN NaN 0.5
key1/15 NaN NaN NaN 0.5
key1/2  0.5 NaN NaN NaN
key1/5  NaN 0.6 NaN NaN
key1/6  NaN 0.7 NaN NaN

Your group key will look like this:

1   group
key1/1  0.5 1
key1/2  0.5 1
key1/3  NaN 1
key1/4  NaN 1
key1/5  0.6 2
key1/6  0.7 2
key1/7  NaN 2
key1/8  NaN 2
key1/9  NaN 2
key1/10 0.5 3
key1/11 0.5 3
key1/12 0.5 3
key1/13 NaN 3
key1/14 0.5 4
key1/15 0.5 4

This data you can translate it into column format.

Complete solution:

df2 = df.mask((df['1'] == 0) )
df2['group'] = (df2['1'].shift(1).isnull() & df2['1'].notnull()).cumsum()
df2 = df2.dropna()
x = df2.groupby('group')['1'].apply(list)
df3 = pd.DataFrame(x.values.tolist()).T
df3.index = [f"key1/{i}" for i in range(1,len(df3)+1)]

        0   1   2   3
key1/1  0.5 0.6 0.5 0.5
key1/2  0.5 0.7 0.5 0.5
key1/3  NaN NaN 0.5 NaN

If you want something in that format you need to have data like this:

group
1         [0.5, 0.5]
2         [0.6, 0.7]
3    [0.5, 0.5, 0.5]
4         [0.5, 0.5]
Name: 1, dtype: object

Update1:

Assuming data to be:

enter image description here

def func(r):
    df2 = r.mask((r['1'] == 0) )
    df2['group'] = (df2['1'].shift(1).isnull() & df2['1'].notnull()).cumsum()
    df2 = df2.dropna()
    x = df2.groupby('group')['1'].apply(list)
    df3 = pd.DataFrame(x.values.tolist()).T
#     df3.index = [r.name]*len(df3)
    return (df3)

df.groupby(df.index).apply(func)

enter image description here

Upvotes: 1

Related Questions