Sultanust
Sultanust

Reputation: 33

Iterate through df rows faster

I am trying to iterate through rows of a Pandas df to get data from one column of the row, and using that data to add new columns. The code is listed below but it is VERY slow. Is there any way to do what I am trying to do without iterating thru the individual rows of the dataframe?

ctqparam = []
wwy = []
ww = []
for index, row in df.iterrows():
    date = str(row['Event_Start_Time'])
    day = int(date[8] + date[9])
    month = int(date[5] + date[6])
    total = 0
    for i in range(0, month-1):
        total += months[i]
    total += day
    out = total // 7
    ww += [out]
    wwy += [str(date[0] + date[1] + date[2] + date[3])]

    val = str(row['TPRev'])
    out = ""
    for letter in val:
        if letter != '.':
            out += letter
    df.replace(to_replace=row['TPRev'], value=str(out), inplace = True)

    val = str(row['Subtest'])
    if val in ctqparam_dict.keys():
        ctqparam += [ctqparam_dict[val]]

# add WWY column, WW column, and correct data format of Test_Tape column
df.insert(0, column='Work_Week_Year', value = wwy)
df.insert(3, column='Work_Week', value = ww)
df.insert(4, column='ctqparam', value = ctqparam)

Upvotes: 1

Views: 251

Answers (1)

MDR
MDR

Reputation: 2670

It's hard to say exactly what your trying to do. However, if you're looping through rows chances are that there is a better way to do it.

For example, given a csv file that looks like this..

Event_Start_Time,TPRev,Subtest
4/12/19 06:00,"this. string. has dots.. in it.",{'A_Dict':'maybe?'}
6/10/19 04:27,"another stri.ng wi.th d.ots.",{'A_Dict':'aVal'}

You may want to:

  1. Format Event_Start_Time as datetime.
  2. Get the week number from Event_Start_Time.
  3. Remove all the dots (.) from the strings in column TPRev.
  4. Expand a dictionary contained in Subtest to its own column.

Without looping through the rows, consider doing thing by columns. Like doing it to the first 'cell' of the column and it replicates all the way down.

Code:

import pandas as pd

df = pd.read_csv('data.csv')

print(df)

     Event_Start_Time    TPRev                              Subtest
0    4/12/19 06:00       this. string. has dots.. in it.    {'A_Dict':'maybe?'}
1    6/10/19 04:27       another stri.ng wi.th d.ots.       {'A_Dict':'aVal'}


# format 'Event_Start_Time' as as datetime
df['Event_Start_Time'] = pd.to_datetime(df['Event_Start_Time'], format='%d/%m/%y %H:%M')

# get the week number from 'Event_Start_Time'
df['Week_Number'] = df['Event_Start_Time'].dt.isocalendar().week

# replace all '.' (periods) in the 'TPRev' column
df['TPRev'] = df['TPRev'].str.replace('.', '', regex=False)

# get a dictionary string out of column 'Subtest' and put into a new column
df = pd.concat([df.drop(['Subtest'], axis=1), df['Subtest'].map(eval).apply(pd.Series)], axis=1)

print(df)

     Event_Start_Time      TPRev                       Week_Number    A_Dict
0    2019-12-04 06:00:00   this string has dots in it  49             maybe?
1    2019-10-06 04:27:00   another string with dots    40             aVal


print(df.info())

Data columns (total 4 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   Event_Start_Time  2 non-null      datetime64[ns]
 1   TPRev             2 non-null      object        
 2   Week_Number       2 non-null      UInt32        
 3   A_Dict            2 non-null      object        
dtypes: UInt32(1), datetime64[ns](1), object(2)

So you end up with a dataframe like this...

     Event_Start_Time      TPRev                       Week_Number    A_Dict
0    2019-12-04 06:00:00   this string has dots in it  49             maybe?
1    2019-10-06 04:27:00   another string with dots    40             aVa

Obviously you'll probably want to do other things. Look at your data. Make a list of what you want to do to each column or what new columns you need. Don't mention how right now as chances are it's possible and has been done before - you just need to find the existing method.

You may write down get the difference in days from the current row and the row beneath etc.). Finally search out how to do the formatting or calculation you require. Break the problem down.

Upvotes: 1

Related Questions