Reputation: 499
i have 2 dataframes parent and child, i want to concatenate both in groupby manner
df_parent
parent parent_value
0 Super Sun 0
1 Alpha Mars 4
2 Pluto 9
df_child
child value
0 Planet Sun 100
1 one Sun direction 101
2 Ice Pluto Tune 101
3 Life on Mars 99
4 Mars Robot 105
5 Sun Twins 200
I want the ouput to be in order order = ['Sun', 'Pluto', 'Mars']
Sun
-childs
Pluto
-childs
Mards
-childs
I want to find the child with keyword wise, refer parent_dict
parent_dict = {'Super Sun': 'Sun',
'Alpha Mars': 'Mars',
'Pluto': 'Pluto'}
expected output
child value
0 Super Sun 0 # parent
1 Planet Sun 100 # child
2 one Sun direction 101 # child
3 Sun Twins 200 # child
4 Pluto 9 # parent
5 Ice Pluto Tune 101 # child
6 Alpha Mars 4 # parent
7 Life on Mars 99 # child
8 Mars Robot 105 # child
So far i have tried to iterate master list and both dfs, but expected output is not coming, here is my code
output_df = pd.DataFrame()
for o in order:
key = o
for j, row in df_parent.iterrows():
if key in row[0]:
output_df.at[j, 'parent'] = key
output_df.at[j, 'value'] = row[1]
for k, row1 in df_child.iterrows():
if key in row1[0]:
output_df.at[j, 'parent'] = key
output_df.at[j, 'value'] = row[1]
print(output_df)
Output:
parent value
0 Sun 0.0
2 Pluto 9.0
1 Mars 4.0
Upvotes: 1
Views: 735
Reputation: 107587
Consider concatenating both dataframes and order by a keyword find:
order = ['Sun', 'Pluto', 'Mars']
def find_keyword(str_param):
output = None
# ITERATE THROUGH LIST AND RETURN MATCHING POSITION
for i,v in enumerate(order):
if v in str_param:
output = i
return output
# RENAME COLS AND CONCAT DFs
df_combined = pd.concat([df_parent.rename(columns={'parent':'item', 'parent_value':'value'}),
df_child.rename(columns={'child':'item'})],
ignore_index=True)
# CREATE KEYWORD COL WITH DEFINED FUNCTION
df_combined['keyword'] = df_combined['item'].apply(find_keyword)
# SORT BY KEYWORD AND DROP HELPER COL
df_combined = df_combined.sort_values(['keyword', 'value'])\
.drop(columns=['keyword']).reset_index(drop=True)
print(df_combined)
# item value
# 0 Super Sun 0
# 1 Planet Sun 100
# 2 one Sun direction 101
# 3 Sun Twins 200
# 4 Pluto 9
# 5 Ice Pluto Tune 101
# 6 Alpha Mars 4
# 7 Life on Mars 99
# 8 Mars Robot 105
Upvotes: 0
Reputation: 29635
You can use append
with both dataframe after some preparation. First create a column keyword in both df_parent
and df_child
used for sorting later. To do so, you an use np.select
such as:
import pandas as pd
order = ['Sun', 'Pluto', 'Mars']
condlist_parent = [df_parent['parent'].str.contains(word) for word in order]
df_parent['keyword'] = pd.np.select(condlist = condlist_parent, choicelist = order, default = None)
condlist_child = [df_child['child'].str.contains(word) for word in order]
df_child['keyword'] = pd.np.select(condlist = condlist_child, choicelist = order, default = None)
giving for example for df_parent
:
parent parent_value keyword
0 Super Sun 0 Sun
1 Alpha Mars 4 Mars
2 Pluto 9 Pluto
Now you can use append
and also Categorical
to order the dataframe according to the list order
. The rename
is used to fit your expected output and for the append
working as wanted (columns should have the same name in both dataframe).
df_all = (df_parent.rename(columns={'parent':'child','parent_value':'value'})
.append(df_child,ignore_index=True))
# to order the column keyword with the list order
df_all['keyword'] = pd.Categorical(df_all['keyword'], ordered=True, categories=order)
# now sort_values by the column keyword, reset_index and drop the column keyword
df_output = (df_all.sort_values('keyword')
.reset_index(drop=True).drop('keyword',1)) # last two methods are for cosmetic
The output is then:
child value
0 Super Sun 0
1 Planet Sun 100
2 one Sun direction 101
3 Sun Twins 200
4 Pluto 9
5 Ice Pluto Tune 101
6 Alpha Mars 4
7 Life on Mars 99
8 Mars Robot 105
Note: The fact that the parents are before childs after sorting on 'keyword' is that df_child
is appened to df_parent
, and not in the reverse.
Upvotes: 1
Reputation: 499
Here is one solution, by iterating both dataframes, but this seems a very very long procedure
output_df = pd.DataFrame()
c = 0
for o in order:
key = o
for j, row in df_parent.iterrows():
if key in row[0]:
output_df.at[c, 'parent'] = row[0]
output_df.at[c, 'value'] = row[1]
c += 1
for k, row1 in df_child.iterrows():
if key in row1[0]:
output_df.at[c, 'parent'] = row1[0]
output_df.at[c, 'value'] = row1[1]
c += 1
Output:
parent value
0 Super Sun 0.0
1 Planet Sun 100.0
2 one Sun direction 101.0
3 Sun Twins 200.0
4 Pluto 9.0
5 Ice Pluto Tune 101.0
6 Alpha Mars 4.0
7 Life on Mars 99.0
8 Mars Robot 105.0
Upvotes: 0