Reputation: 1595
Consider dataframes df1
, df2
, and df3
.
df1
and df2
have an id
column, and df3
has a from_id
and to_id
column.
I need to iterate over all rows of df3
, and replace from_id
and to_id
with new unique randomly generated UUIDs, and then update those in df1
and df2
where (id == from_id) | (id == to_id)
(before the change to UUID).
I originally wanted to iterate over all rows of df3
and simply check both df1
and df2
if they contain the original from_id
or to_id
inside the id
column before replacing both, but I found that iterating over pandas rows is a bad idea and slow.
I'm not sure how I can apply the other mentioned methods in that post to this problem since I'm not applying a simple function or calculating anything, and I think the way I had intended to do it would be too slow for big dataframes.
My current method that I believe to be slow and inefficient:
import pandas as pd
import uuid
def rand_uuid():
return uuid.uuid1()
def update_ids(df_places: pd.DataFrame, df_transitions: pd.DataFrame, df_arcs: pd.DataFrame) -> Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]:
for i in range(len(df_arcs)):
new_uuid_from = __rand_uuid()
new_uuid_to = __rand_uuid()
new_uuid_arc = __rand_uuid()
df_transitions.loc[df_transitions.id == df_arcs.iloc[i]['sourceId'], 'id'] = new_uuid_from
df_transitions.loc[df_transitions.id == df_arcs.iloc[i]['destinationId'], 'id'] = new_uuid_to
df_places.loc[df_places.id == df_arcs.iloc[i]['sourceId'], 'id'] = new_uuid_from
df_places.loc[df_places.id == df_arcs.iloc[i]['destinationId'], 'id'] = new_uuid_to
df_arcs.iloc[i]['sourceId'] = new_uuid_from
df_arcs.iloc[i]['destinationId'] = new_uuid_to
df_arcs.iloc[i]['id'] = new_uuid_arc
return df_places, df_transitions, df_arcs
Here df_places
and df_transitions
are above mentioned df1
and df2
, and df_arcs
is df3
Example df_places
+---+----+
| | id |
+---+----+
| 1 | a1 |
+---+----+
| 2 | c1 |
+---+----+
Example df_transitions
:
+---+----+
| | id |
+---+----+
| 1 | b1 |
+---+----+
Example df_arcs
:
+---+----------+---------------+
| | sourceId | destinationId |
+---+----------+---------------+
| 1 | a1 | b1 |
+---+----------+---------------+
| 2 | b1 | c1 |
+---+----------+---------------+
Upvotes: 0
Views: 123
Reputation: 3639
A very simple approach:
import itertools
import uuid
def rand_uuid():
return uuid.uuid4()
rep_dict = {i: rand_uuid() for i in itertools.chain(df1.id, df2.id)}
df3.replace(rep_dict, inplace=True)
df3.id = df3.id.map(lambda x: rand_uuid())
df1.replace(rep_dict, inplace=True)
df2.replace(rep_dict, inplace=True)
Upvotes: 2