Efficient Pandas Dataframe insert

Question

I'm trying to add float values like [[(1,0.44),(2,0.5),(3,0.1)],[(2,0.63),(1,0.85),(3,0.11)],[...]] to a Pandas dataframe which looks like a matrix build from the first value of the tuples

df = 1 2 3 1 0.44 0.5 0.1 2 0.85 0.63 0.11 3 ... ... ...

I tried this:

    for key, value in enumerate(outer_list):
      for tuplevalue in value:
        df.ix[key][tuplevalue[0]] = tuplevalue[1]

The Problem is that my NxN-Matrix contains about 10000x10000 elements and hence it takes really long with my approach. Is there another possibility to speed this up?

(Unfortunately the values in the list are not ordered by the first tuple element)

Alexander · Accepted Answer

Use list comprehensions to first sort and extract your data. Then create your dataframe from the sorted and cleaned data.

data = [[(1, 0.44), (2, 0.50), (3, 0.10)],
        [(2, 0.63), (1, 0.85), (3, 0.11)]]

# First, sort each row.
_ = [row.sort() for row in data]

# Then extract the second element of each tuple.
new_data = [[t[1] for t in row] for row in data]

# Now create a dataframe from your data.
>>> pd.DataFrame(new_data)
      0     1     2
0  0.44  0.50  0.10
1  0.85  0.63  0.11

Efficient Pandas Dataframe insert

Answers (2)

Related Questions