LeoFr
LeoFr

Reputation: 131

iterate trough all rows of dataframe

So my dataframe consists of 500k rows/3 columns and based on that I want to create a json file

jsonFile = {
    "attribute1": [
        {
          "key1": "a",
          "key2": "b",
          "key3": c
        },
        {
          "key1": "d",
          "key2": "e",
          "key3": f
        },
        (...)
    ]
}

Doing it like this

jsonFile['attribute1'] = []

for i in range(0,len(df)):
    jsonFile['attribute1'].append({
            "key1": df["col1"][i],
            "key2": df["col2"][i],
            "key3": df["col3"][i]
        })

takes way too long time. I read something about Numpy Vectorization but dont know if that is applicable for my case, because in all examples I saw for that, you add new columns with that method.

Upvotes: 0

Views: 58

Answers (1)

DeepSpace
DeepSpace

Reputation: 81604

You should avoid using Python for loops with dataframes as much as possible.

In this case, you can rename the columns using .rename then use .to_dict with orient='records':

df = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6]})
output = {'attribute1': df.rename(columns={'col1': 'key1', 'col2': 'key2'}).to_dict(orient='records')}
print(output)

will output

{'attribute1': [{'col1': 1, 'col2': 4}, {'col1': 2, 'col2': 5}, {'col1': 3, 'col2': 6}]}

Checking the timings for 500K rows, using the above method is ~12 times faster:

from timeit import Timer

df = pd.DataFrame({'col1': list(range(500000)), 'col2': list(range(500000))})

def rename_and_to_dict():
    {'attribute1': df.rename(columns={'col1': 'key1', 'col2': 'key2'}).to_dict(orient='records')}

def for_loop():
    output = {'attribute1': []}
    for i in range(0, len(df)):
        output['attribute1'].append({
            "key1": df["col1"][i],
            "key2": df["col2"][i]
        })


print('rename_and_to_dict', min(Timer(rename_and_to_dict).repeat(1, 1)))
print('for_loop', min(Timer(for_loop).repeat(1, 1)))

Outputs

rename_and_to_dict 0.3934917000000001
for_loop 4.469996500000001

Upvotes: 4

Related Questions