Reputation: 135
I am new to Pandas in python. I have a dataframe with 2 keys 15 rows each and 1 column like below
1
key1/1 0.5
key1/2 0.5
key1/3 0
key1/4 0
key1/5 0.6
key1/6 0.7
key1/7 0
key1/8 0
key1/9 0
key1/10 0.5
key1/11 0.5
key1/12 0.5
key1/13 0
key1/14 0.5
key1/15 0.5
key2/1 0.4
key2/2 0.2
key2/3 0
key2/4 0
key2/5 0.1
key2/6 0.2
key2/7 0
key2/8 0
key2/9 0.3
key2/10 0.2
key2/11 0
key2/12 0.5
key2/13 0
key2/14 0
key2/15 0.5
I want to iterate the rows of the dataframe so each time it meets a 'zero' it creates a new column like below
1 2 3 4
key1/1 0.5 0.6 0.5 0.5
key1/2 0.5 0.7 0.5 0.5
key1/3 nan nan 0.5 nan
key1/4 nan nan nan nan
1 2 3 4 5
key2/1 0.4 0.1 0.3 0.5 0.5
key2/2 0.2 0.2 0.2 nan nan
key2/3 nan nan nan nan nan
key2/4 nan nan nan nan nan
I have tried the following code trying to iterate 'key1' only
df2=pd.Dataframe[]
for row in df['key1'].index:
new_df['keyl'][row] == df['keyl'][row]
if df['keyl'][row] == 0:
new_df['key1'].append(df2,ignore_index=True)
Obviously it is not working, please send some help. Ideally I would like to modify the same dataframe instead of creating a new one. Thanks
EDIT
Below is a drawing of how my data looks like
And below is what I am trying to achieve
Upvotes: 1
Views: 179
Reputation: 13349
You can use mask them by zero and assign a key. Based on the key you can group them and transform them to columns.
All credit goes to this answer. You will find a great explanation there.
df2 = df.mask((df['1'] == 0) )
df2['group'] = (df2['1'].shift(1).isnull() & df2['1'].notnull()).cumsum()
df2 = df2.dropna()
df2.pivot(columns='group')
1
group 1 2 3 4
key1/1 0.5 NaN NaN NaN
key1/10 NaN NaN 0.5 NaN
key1/11 NaN NaN 0.5 NaN
key1/12 NaN NaN 0.5 NaN
key1/14 NaN NaN NaN 0.5
key1/15 NaN NaN NaN 0.5
key1/2 0.5 NaN NaN NaN
key1/5 NaN 0.6 NaN NaN
key1/6 NaN 0.7 NaN NaN
Your group key will look like this:
1 group
key1/1 0.5 1
key1/2 0.5 1
key1/3 NaN 1
key1/4 NaN 1
key1/5 0.6 2
key1/6 0.7 2
key1/7 NaN 2
key1/8 NaN 2
key1/9 NaN 2
key1/10 0.5 3
key1/11 0.5 3
key1/12 0.5 3
key1/13 NaN 3
key1/14 0.5 4
key1/15 0.5 4
This data you can translate it into column format.
df2 = df.mask((df['1'] == 0) )
df2['group'] = (df2['1'].shift(1).isnull() & df2['1'].notnull()).cumsum()
df2 = df2.dropna()
x = df2.groupby('group')['1'].apply(list)
df3 = pd.DataFrame(x.values.tolist()).T
df3.index = [f"key1/{i}" for i in range(1,len(df3)+1)]
0 1 2 3
key1/1 0.5 0.6 0.5 0.5
key1/2 0.5 0.7 0.5 0.5
key1/3 NaN NaN 0.5 NaN
If you want something in that format you need to have data like this:
group
1 [0.5, 0.5]
2 [0.6, 0.7]
3 [0.5, 0.5, 0.5]
4 [0.5, 0.5]
Name: 1, dtype: object
Assuming data to be:
def func(r):
df2 = r.mask((r['1'] == 0) )
df2['group'] = (df2['1'].shift(1).isnull() & df2['1'].notnull()).cumsum()
df2 = df2.dropna()
x = df2.groupby('group')['1'].apply(list)
df3 = pd.DataFrame(x.values.tolist()).T
# df3.index = [r.name]*len(df3)
return (df3)
df.groupby(df.index).apply(func)
Upvotes: 1