Reputation: 654
I have 180 000 pandas Series that I would need to combine into one DataFrame. Adding them one by one takes a lot of time, apparently because appending gets increasingly slower when the size of the dataframe increases. The same problem persists even if I use numpy which is faster than Pandas in this.
What could be an even better way to create a DataFrame from the Series?
Edit: Some more background info. The Series were stored in a list. It is sports data, and the list was called player_library with 180 000 + items. I didn't realise that it is enough to write just
pd.concat(player_library, axis=1)
instead of listing all the individual items. Now it works fast and nicely.
Upvotes: 0
Views: 329
Reputation: 1
Input-
series = pd.Series(["BMW", "Toyota", "Honda"]) series
Output-
0 BMW
1 Toyota
2 Honda
dtype: object
Input-
colours = pd.Series(["Red", "Blue", "White"]) colours
Output-
0 Red
1 Blue
2 White
dtype: object
Input-
car_data = pd.DataFrame({"Car make": series, "Colour": colours}) car_data
Output-
Car make | Colour | |
---|---|---|
0 | BMW | Red |
1 | Toyota | Blue |
2 | Honda | White |
Upvotes: -1
Reputation: 5183
You could try pd.concat
instead of append
.
If you want each series to be a column then
df = pd.concat([list_of_series_objects], axis=1)
For more detail on why it is expensive to iterate and append read this question
Upvotes: 2