How can I copy values from one dataframe to other dataframe fastly

Question

I would like to create on my Dataframe (Global_Dataset) a new column (Col_val) based on the other Dataframe (List_Data). I need a faster code because I have a dataset of 2 million samples and List_data contains 50000 samples.

Col_Val must contain the value of column Value according to Col_Key

List_Data:

id     Key     Value
1       5        0
2       7        1
3       9        2

Global_Dataset:

id    Col_Key     Col_Val
1        9           2
2        5           0
3        9           2 
4        7           1
5        7           1
6        5           0
7        9           2
8        7           1
9        9           2
10       5           0

I have tried this code but it needs a long time to be executed. Is there any other faster way for achieving my goal?

Col_Val = []
for i in range (len(List_Data)):
    for j in range (len(Global_Data)):
        if List_Data.get_value(i, "Key") == Global_Data.get_value(j, 'Col_Key') :
            Col_Val.append(List_Data.get_value(i, 'Value'))

Global_Data['Col_Val'] = Col_Val

PS: I have tried loc and iloc instead of get_value but it works very slow

Valentino · Accepted Answer

Try this:

data_dict = {key : value for key, value in zip(List_Data['Key'], List_Data['Value'])}
Global_Data['Col_Val'] = pd.Series([data_dict[key] for key in Global_Data['Col_Key']])

I don't know how long it will takes on your machine with the amount of data you need to handle, but it should be faster of what you are using now.
You could also generate the dictionary with data_dict = {row['Key'] : row['Value'] for _, row in list_data.iterrows()} but on my machine is slower than what I proposed above.

It works under the assumption that all the keys in Global_Data['Col_Keys'] are present in List_Data['Key'], otherwise you will get a KeyError.

How can I copy values from one dataframe to other dataframe fastly

Answers (2)

Related Questions