LOR13
LOR13

Reputation: 83

Append rows to a dataframe efficiently

I have a dataframe that looks like this

import pandas as pd

df = pd.DataFrame({'Timestamp': ['1642847484', '1642847484', '1642847484', '1642847484', '1642847487', '1642847487','1642847487','1642847487','1642847487','1642847487','1642847487','1642847487', '1642847489', '1642847489', '1642847489'],
                   'value': [11, 10, 14, 20, 3, 2, 9, 48, 5, 20, 12, 20, 56, 12, 8]})

I need to do some operations on each group of values with the same timestamp , so I use groupBy as follows :

df_grouped = df.groupby('Timestamp')

And then iterate over the rows of each group and append the results row by row in a new dataframe:

df_out = pd.DataFrame(columns=( 'Timestamp', 'value'))
for group_name, df_group in df_grouped:
    i = 0
    for row_index, row in df_group.iterrows():
        row['Timestamp'] = row['Timestamp']* 1000  + i * 30
        df_out = df_out.append(row)
        i = i+1
    print(df_out.tail())

But my approach takes so much time (7M+ rows ) and I was wondering if there is a more efficient way to do so . Thank you

Upvotes: 1

Views: 154

Answers (1)

jezrael
jezrael

Reputation: 862711

I think itterows here is not necessary, you can use:

def f(x):

    x['Timestamp'] = ...
    ....
    return x
    
df1 = df.groupby('Timestamp').apply(f)

EDIT: Create counter Series by GroupBy.cumcount, multiple and add to Timestamp:

#if necessary
df['Timestamp'] = df['Timestamp'].astype(np.int64)

df['Timestamp'] = df['Timestamp'] * 1000 + df.groupby('Timestamp').cumcount() * 30
print(df)
        Timestamp  value
0   1642847484000     11
1   1642847484030     10
2   1642847484060     14
3   1642847484090     20
4   1642847487000      3
5   1642847487030      2
6   1642847487060      9
7   1642847487090     48
8   1642847487120      5
9   1642847487150     20
10  1642847487180     12
11  1642847487210     20
12  1642847489000     56
13  1642847489030     12
14  1642847489060      8

Upvotes: 2

Related Questions