Luca91
Luca91

Reputation: 609

Pandas: repeat n times value of each n rows

I have a dataframe with 2563199 rows. It Looks like:

          index    dtm        f
      0     0   00:00:00    50.065
      1     1   00:00:01    50.061
      2     2   00:00:02    50.058
      3     3   00:00:03    50.049
      4     4   00:00:04    50.044
      5     5   00:00:05    50.044
      6     6   00:00:06    50.042
      7     7   00:00:07    50.042
    ....................
   2591997  2591997 23:59:57    50.009
   2591998  2591998 23:59:58    50.008
   2591999  2591999 23:59:59    50.006

I would like to create a new column, that repeats the value contained every n rows n times. For instance, if I set to repeat value in row 4 it will repeat 50.049 in the previous 4 rows, and the value 50.042 in the following 4 and so on. (If the lenght of the dataframe doesn't match the exact Division doesn't matter). Like below:

          index   dtm         f
      0     0   00:00:00    50.049
      1     1   00:00:01    50.049
      2     2   00:00:02    50.049          
      3     3   00:00:03    50.049
      4     4   00:00:04    50.042
      5     5   00:00:05    50.042
      6     6   00:00:06    50.042
      7     7   00:00:07    50.042

I was trying every 86400 rows:

arr = np.arange(len(df)) // 86400
for x in arr:
    df['value']=df['f'].iloc[x+86400]

any idea? thank you!

Upvotes: 2

Views: 4039

Answers (3)

Thomas Arildsen
Thomas Arildsen

Reputation: 1290

Here is a solution without explicit loops, relying only on Pandas:

N = 4  # Repetition multiplier

# Create a small example DataFrame
df = pd.DataFrame({'value': range(12)}).astype(float)

# Prepare a new column for repeated values by picking only each N'th value from the 'value' column
df['repeated'] = df['value'].iloc[::N]

# Repeat the values, exploiting that the 'repeated' column has missing values
df['repeated'].ffill(inplace=True)

Upvotes: 0

yatu
yatu

Reputation: 88226

Here's a way to do it avoiding looping over the df.

Firstly set a n, and generate a list with the existing indices excluding the rows that will be used to repeat the values in f:

n=4
ix = [x for i, x in enumerate(df.index.values) if (i + 1) % n != 0]
print(ix)
[0, 1, 2, 4, 5, 6]

Now set these values to np.nan and use bfill:

df.loc[ix, 'f'] = np.nan
df['f'] = df.f.bfill()

print(df)
    index       dtm       f
0      0  00:00:00  50.049
1      1  00:00:01  50.049
2      2  00:00:02  50.049
3      3  00:00:03  50.049
4      4  00:00:04  50.042
5      5  00:00:05  50.042
6      6  00:00:06  50.042
7      7  00:00:07  50.042

Upvotes: 3

ALollz
ALollz

Reputation: 59519

Use numpy and array slicing

import numpy as np

n = 4
df['fnew'] = np.concatenate([np.repeat(df.f.values[n-1::n], n),
                             np.repeat(np.NaN, len(df)%n)])

Output:

n=3
   index       dtm       f    fnew
0      0  00:00:00  50.065  50.058
1      1  00:00:01  50.061  50.058
2      2  00:00:02  50.058  50.058
3      3  00:00:03  50.049  50.044
4      4  00:00:04  50.044  50.044
5      5  00:00:05  50.044  50.044
6      6  00:00:06  50.042     NaN
7      7  00:00:07  50.042     NaN

n = 4
   index       dtm       f    fnew
0      0  00:00:00  50.065  50.049
1      1  00:00:01  50.061  50.049
2      2  00:00:02  50.058  50.049
3      3  00:00:03  50.049  50.049
4      4  00:00:04  50.044  50.042
5      5  00:00:05  50.044  50.042
6      6  00:00:06  50.042  50.042
7      7  00:00:07  50.042  50.042

n = 5
   index       dtm       f    fnew
0      0  00:00:00  50.065  50.044
1      1  00:00:01  50.061  50.044
2      2  00:00:02  50.058  50.044
3      3  00:00:03  50.049  50.044
4      4  00:00:04  50.044  50.044
5      5  00:00:05  50.044     NaN
6      6  00:00:06  50.042     NaN
7      7  00:00:07  50.042     NaN

Upvotes: 2

Related Questions