Ibrahim Ayoup
Ibrahim Ayoup

Reputation: 473

Add empty row after every unique column value

Im trying to add empty row after every unique Salary column value (Excpect duplicated values without empty row).

Current input :

    Name     Country  Department  Salary
0   John     USA      Finance     12000
1   John     Egypt    Finance     12000
2   Jack     France   Marketing   13000
3   Geroge   UK       Accounts    11000
4   Steven   India    Data        10000
5   Mohammed Jordan   IT          10000

Expected Output :

    Name     Country  Department  Salary
0   John     USA      Finance     12000
1   John     Egypt    Finance     12000

2   Jack     France   Marketing   13000

3   Geroge   UK       Accounts    11000

4   Steven   India    Data        10000
5   Mohammed Jordan   IT          10000

What i have tried :

import pandas as pd

df = pd.DataFrame({'Name': {0: 'John',1: 'John',2: 'Jack',
                            3: 'Geroge',4: 'Steven',5: 'Mohammed'},
                   'Country': {0: 'USA',1: 'Egypt',2: 'France',
                               3: 'UK',4: 'India',5: 'Jordan'},
                   'Department': {0: 'Finance',1: 'Finance',2: 'Marketing',
                                  3: 'Accounts',4: 'Data',5: 'IT'},
                   'Salary': {0: 12000, 1: 12000, 2: 13000, 
                              3: 11000, 4: 10000, 5: 10000}})

df.index = range(0, 2*len(df), 2)
df2 = df.reindex(index=range(2*len(df)))

What i got (Which is incorrect) :

    Name      Country   Department  Salary
0   John      USA       Finance     12000.0
1   NaN       NaN       NaN         NaN
2   John      Egypt     Finance     12000.0
3   NaN       NaN       NaN         NaN
4   Jack      France    Marketing   13000.0
5   NaN       NaN       NaN         NaN
6   Geroge    UK        Accounts    11000.0
7   NaN       NaN       NaN         NaN
8   Steven    India     Data        10000.0
9   NaN       NaN       NaN         NaN
10  Mohammed  Jordan    IT          10000.0
11  NaN       NaN       NaN         NaN

Would appreciate if someone could help me here.

Upvotes: 0

Views: 720

Answers (3)

Sabil
Sabil

Reputation: 4510

This should work for you

import pandas as pd

def f(x):
    return x.append(pd.DataFrame('', columns=df.columns, index=[('')]))


df = pd.DataFrame({'Name': {0: 'John',1: 'John',2: 'Jack',
                            3: 'Geroge',4: 'Steven',5: 'Mohammed'},
                   'Country': {0: 'USA',1: 'Egypt',2: 'France',
                               3: 'UK',4: 'India',5: 'Jordan'},
                   'Department': {0: 'Finance',1: 'Finance',2: 'Marketing',
                                  3: 'Accounts',4: 'Data',5: 'IT'},
                   'Salary': {0: 12000, 1: 12000, 2: 13000, 
                              3: 11000, 4: 10000, 5: 10000}})

df = df.groupby('Salary', sort=False, group_keys=False).apply(f)

print (df)

Output:

       Name Country Department Salary
0      John     USA    Finance  12000
1      John   Egypt    Finance  12000
                                     
2      Jack  France  Marketing  13000
                                     
3    Geroge      UK   Accounts  11000
                                     
4    Steven   India       Data  10000
5  Mohammed  Jordan         IT  10000

Upvotes: 2

Anurag Dabas
Anurag Dabas

Reputation: 24314

IIUC:

try appending empty dataframe by iterating over groupby():

Since I grouped by 'Department' but you can also groupby 'Salary' or aother column according to your need

l=[]
for x,y in df.groupby('Department',sort=False):
    l.append(y)
    l.append(pd.DataFrame([[float('NaN')]*len(y.columns)],columns=y.columns))

df=pd.concat(l,ignore_index=True).iloc[:-1]

output of df:

    Name        Country     Department  Salary
0   John        USA         Finance     12000.0
1   John        Egypt       Finance     12000.0
2   NaN         NaN         NaN         NaN
3   Jack        France      Marketing   13000.0
4   NaN         NaN         NaN         NaN
5   Geroge      UK          Accounts    11000.0
6   NaN         NaN         NaN         NaN
7   Steven      India       Data        10000.0
8   NaN         NaN         NaN         NaN
9   Mohammed    Jordan      IT          10000.0

Upvotes: 2

Fergus Kwan
Fergus Kwan

Reputation: 373

not sure if it is the most elegant way to do it. But here is what I will approach the problem.

pandas.DataFrame.groupby can help you group the dataframe of a specific column by their unique value. It will generate a generator that will have key and the sliced dataframe for each iteration. Using groupby can help you first slice the df according to salaries by unique value then perform some necessary operation.

output_df_list = []
for key, groupby_df in df.groupby('Salary'):
    groupby_df_with_nan_tail = groupby_df.append(pd.Series(None), ignore_index=True)
    output_df_list.append(groupby_df_with_nan_tail)
    
output_df = pd.concat(output_df_list)

Upvotes: 1

Related Questions