kurdtc
kurdtc

Reputation: 1621

Speed up list creation from pandas dataframe

I have a pandas dataframe df from which I need to create a list Row_list.

import pandas as pd

df = pd.DataFrame([[1, 572548.283, 166424.411, -11.849, -11.512], 
                   [2, 572558.153, 166442.134, -11.768, -11.983],
                   [3, 572124.999, 166423.478, -11.861, -11.512],
                   [4, 572534.264, 166414.417, -11.123, -11.993]], 
                   columns=['PointNo','easting', 'northing', 't_20080729','t_20090808'])

I am able to create the list in the required format with the code below, but my dataframe has up to 8 million rows and the list creation is very slow.

def test_get_value_iterrows(df):
    Row_list =[]
    for index, rows in df.iterrows():
        entirerow = df.values[index,].tolist()
        entirerow.append((df.iloc[index,1],df.iloc[index,2]))
        Row_list.append(entirerow)
Row_list
%timeit test_get_value_iterrows(df)

436 µs ± 6.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Not using df.iterrows() and df.iloc() is a little bit faster,

def test_get_value(df):
    Row_list =[]
    for i in df.index:
        entirerow = df.values[i,].tolist()
        entirerow.append((df.iloc[i,1],df.iloc[i,2]))
        Row_list.append(entirerow)
Row_list
%timeit test_get_value(df)

270 µs ± 14.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

I am wondering if there is a faster solution to this?

Upvotes: 1

Views: 121

Answers (1)

jezrael
jezrael

Reputation: 862406

Use list comprehension:

df = pd.concat([df] * 10000, ignore_index=True)

In [123]: %timeit [[*x, (x[1], x[2])] for x in df.values.tolist()]
27.8 ms ± 404 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [124]: %timeit [x + [(x[1], x[2])] for x in df.values.tolist()]
26.6 ms ± 441 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [125]: %timeit (test_get_value(df))
41.2 s ± 1.97 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

Upvotes: 1

Related Questions