Reputation: 113
I have a section of python code as shown:
# Main Loop that take values attributed to the row by row basis and sorts
# them into correpsonding columns based on matching the 'Name' and the newly
# generated column names.
listed_names=list(df_cv) #list of column names to reference later.
variable=listed_names[3:] #List of the 3rd to the last column. Column 1&2 are irrelevant.
for i in df_cv.index: #For each index in the Dataframe (DF)
for m in variable: #For each variable in the list of variable column names
if df_cv.loc[i,'Name']==m: #If index location in variable name is equal to the variable column name...
df_cv.loc[i,m]=df_cv.loc[i,'Value'] #...Then that location is equal to the value in same row under the column 'Value'
Basically it takes a 3xn list of time/name/value and sorts it into an pandas df of size n by unique(n).
Time Name Value
1 Color Red
2 Age 6
3 Temp 25
4 Age 1
Into this:
Time Color Age Temp
1 Red
2 6
3 25
4 1
My code take a terribly long amount of time to run and I wanted to know if there is a better way to set up my loops. I come from a MATLAB background so the style of python (ie not using rows/column for everything is still alien).
How can I make this section of code run faster?
Upvotes: 0
Views: 286
Reputation: 353059
Instead of looping, think of it as a pivot operation. Assuming that Time is a column and not an index (if it is, just use reset_index
):
In [96]: df
Out[96]:
Time Name Value
0 1 Color Red
1 2 Age 6
2 3 Temp 25
3 4 Age 1
In [97]: df.pivot(index="Time", columns="Name", values="Value")
Out[97]:
Name Age Color Temp
Time
1 None Red None
2 6 None None
3 None None 25
4 1 None None
In [98]: df.pivot(index="Time", columns="Name", values="Value").fillna("")
Out[98]:
Name Age Color Temp
Time
1 Red
2 6
3 25
4 1
This should be much faster on real datasets, and is simpler to boot.
Upvotes: 4