Reputation: 473
Im trying to add empty row after every unique Salary
column value (Excpect duplicated values without empty row).
Current input :
Name Country Department Salary
0 John USA Finance 12000
1 John Egypt Finance 12000
2 Jack France Marketing 13000
3 Geroge UK Accounts 11000
4 Steven India Data 10000
5 Mohammed Jordan IT 10000
Expected Output :
Name Country Department Salary
0 John USA Finance 12000
1 John Egypt Finance 12000
2 Jack France Marketing 13000
3 Geroge UK Accounts 11000
4 Steven India Data 10000
5 Mohammed Jordan IT 10000
What i have tried :
import pandas as pd
df = pd.DataFrame({'Name': {0: 'John',1: 'John',2: 'Jack',
3: 'Geroge',4: 'Steven',5: 'Mohammed'},
'Country': {0: 'USA',1: 'Egypt',2: 'France',
3: 'UK',4: 'India',5: 'Jordan'},
'Department': {0: 'Finance',1: 'Finance',2: 'Marketing',
3: 'Accounts',4: 'Data',5: 'IT'},
'Salary': {0: 12000, 1: 12000, 2: 13000,
3: 11000, 4: 10000, 5: 10000}})
df.index = range(0, 2*len(df), 2)
df2 = df.reindex(index=range(2*len(df)))
What i got (Which is incorrect) :
Name Country Department Salary
0 John USA Finance 12000.0
1 NaN NaN NaN NaN
2 John Egypt Finance 12000.0
3 NaN NaN NaN NaN
4 Jack France Marketing 13000.0
5 NaN NaN NaN NaN
6 Geroge UK Accounts 11000.0
7 NaN NaN NaN NaN
8 Steven India Data 10000.0
9 NaN NaN NaN NaN
10 Mohammed Jordan IT 10000.0
11 NaN NaN NaN NaN
Would appreciate if someone could help me here.
Upvotes: 0
Views: 720
Reputation: 4510
This should work for you
import pandas as pd
def f(x):
return x.append(pd.DataFrame('', columns=df.columns, index=[('')]))
df = pd.DataFrame({'Name': {0: 'John',1: 'John',2: 'Jack',
3: 'Geroge',4: 'Steven',5: 'Mohammed'},
'Country': {0: 'USA',1: 'Egypt',2: 'France',
3: 'UK',4: 'India',5: 'Jordan'},
'Department': {0: 'Finance',1: 'Finance',2: 'Marketing',
3: 'Accounts',4: 'Data',5: 'IT'},
'Salary': {0: 12000, 1: 12000, 2: 13000,
3: 11000, 4: 10000, 5: 10000}})
df = df.groupby('Salary', sort=False, group_keys=False).apply(f)
print (df)
Output:
Name Country Department Salary
0 John USA Finance 12000
1 John Egypt Finance 12000
2 Jack France Marketing 13000
3 Geroge UK Accounts 11000
4 Steven India Data 10000
5 Mohammed Jordan IT 10000
Upvotes: 2
Reputation: 24314
IIUC:
try appending empty dataframe by iterating over groupby()
:
Since I grouped by 'Department' but you can also groupby 'Salary' or aother column according to your need
l=[]
for x,y in df.groupby('Department',sort=False):
l.append(y)
l.append(pd.DataFrame([[float('NaN')]*len(y.columns)],columns=y.columns))
df=pd.concat(l,ignore_index=True).iloc[:-1]
output of df
:
Name Country Department Salary
0 John USA Finance 12000.0
1 John Egypt Finance 12000.0
2 NaN NaN NaN NaN
3 Jack France Marketing 13000.0
4 NaN NaN NaN NaN
5 Geroge UK Accounts 11000.0
6 NaN NaN NaN NaN
7 Steven India Data 10000.0
8 NaN NaN NaN NaN
9 Mohammed Jordan IT 10000.0
Upvotes: 2
Reputation: 373
not sure if it is the most elegant way to do it. But here is what I will approach the problem.
pandas.DataFrame.groupby can help you group the dataframe of a specific column by their unique value. It will generate a generator that will have key and the sliced dataframe for each iteration. Using groupby can help you first slice the df according to salaries by unique value then perform some necessary operation.
output_df_list = []
for key, groupby_df in df.groupby('Salary'):
groupby_df_with_nan_tail = groupby_df.append(pd.Series(None), ignore_index=True)
output_df_list.append(groupby_df_with_nan_tail)
output_df = pd.concat(output_df_list)
Upvotes: 1