FJ35
FJ35

Reputation: 33

Pandas: go through column and if nan value appears, send the remaning column values to new column

In case I have a dataframe as

Index     values
1          2
2          4
3          3
4          nan
5          nan
6          nan
7          9 
8          7
9          8
10         nan
11         5
12         1
13         6

How can I set new columns in a new dataframe for each vector with values different than nan? The solution would something like the following:

Index     values_1     values_2     values_3
1            2            9            5
2            4            7            1
3            3            8            6

Upvotes: 1

Views: 81

Answers (2)

David Erickson
David Erickson

Reputation: 16683

  1. You can get the cumsum() of values that are null to separate into groups, which is how you create columns. This works because it returns False for non-null rows, so when you take the cumsum(), you get the first group all as group 0, and the other groups will be separated into different group IDs as well.
  2. Null rows were used for logic of step 1, but are no longer needed so use dropna()
  3. Now, you reset the group numbers, so they are 1,2,3 with (df['col1'].ne(df['col1'].shift())).cumsum()
  4. You also have to create a new index using df1.groupby('col1').cumcount() + 1, in order fror the rows to be on the same line.
  5. Then, just pivot defining an index and columns.
  6. Finally, clean up the column names into one level with the last line of code.

# df1 = df1.drop('Index', axis=1) #include this line of code if "Index" is a column. Otherwise, omit.
df['col1'] = (df['values'].isnull().cumsum())
df = df.dropna()
df['col1'] = (df['col1'].ne(df['col1'].shift())).cumsum()
df['col2'] = df.groupby('col1').cumcount()
df = (df.pivot(index='col2', columns='col1').reset_index(drop=True)
        .add_prefix('values_').droplevel(0, axis=1))
# can also use # df.columns = [f'{col[0]}_{col[1]}' for col in df.columns] # instead of # .add_prefix('values_').droplevel(0, axis=1))
df
Out[1]: 
   values_1  values_2  values_3
0       2.0       9.0       5.0
1       4.0       7.0       1.0
2       3.0       8.0       6.0

Upvotes: 2

Umar.H
Umar.H

Reputation: 23099

we can use a boolean to create a new index then use unstack and add_prefix

df['key'] = ((df['values'].isna()==True) & 
            (df['values'].shift(1).isna()==False)).cumsum() + 1

df = df.dropna()

df1 = df.set_index([df.groupby(['key']).cumcount(),'key'])\
        .unstack(1)\
        .add_prefix('values_')\
        .droplevel(0,1)


print(df1)

key  values_1  values_2  values_3
0         2.0       9.0       5.0
1         4.0       7.0       1.0
2         3.0       8.0       6.0

Upvotes: 2

Related Questions