Reputation: 609
I have a dataframe with 2563199 rows. It Looks like:
index dtm f
0 0 00:00:00 50.065
1 1 00:00:01 50.061
2 2 00:00:02 50.058
3 3 00:00:03 50.049
4 4 00:00:04 50.044
5 5 00:00:05 50.044
6 6 00:00:06 50.042
7 7 00:00:07 50.042
....................
2591997 2591997 23:59:57 50.009
2591998 2591998 23:59:58 50.008
2591999 2591999 23:59:59 50.006
I would like to create a new column, that repeats the value contained every n rows n times. For instance, if I set to repeat value in row 4 it will repeat 50.049 in the previous 4 rows, and the value 50.042 in the following 4 and so on. (If the lenght of the dataframe doesn't match the exact Division doesn't matter). Like below:
index dtm f
0 0 00:00:00 50.049
1 1 00:00:01 50.049
2 2 00:00:02 50.049
3 3 00:00:03 50.049
4 4 00:00:04 50.042
5 5 00:00:05 50.042
6 6 00:00:06 50.042
7 7 00:00:07 50.042
I was trying every 86400 rows:
arr = np.arange(len(df)) // 86400
for x in arr:
df['value']=df['f'].iloc[x+86400]
any idea? thank you!
Upvotes: 2
Views: 4039
Reputation: 1290
Here is a solution without explicit loops, relying only on Pandas:
N = 4 # Repetition multiplier
# Create a small example DataFrame
df = pd.DataFrame({'value': range(12)}).astype(float)
# Prepare a new column for repeated values by picking only each N'th value from the 'value' column
df['repeated'] = df['value'].iloc[::N]
# Repeat the values, exploiting that the 'repeated' column has missing values
df['repeated'].ffill(inplace=True)
Upvotes: 0
Reputation: 88226
Here's a way to do it avoiding looping over the df
.
Firstly set a n
, and generate a list with the existing indices excluding the rows that will be used to repeat the values in f
:
n=4
ix = [x for i, x in enumerate(df.index.values) if (i + 1) % n != 0]
print(ix)
[0, 1, 2, 4, 5, 6]
Now set these values to np.nan
and use bfill
:
df.loc[ix, 'f'] = np.nan
df['f'] = df.f.bfill()
print(df)
index dtm f
0 0 00:00:00 50.049
1 1 00:00:01 50.049
2 2 00:00:02 50.049
3 3 00:00:03 50.049
4 4 00:00:04 50.042
5 5 00:00:05 50.042
6 6 00:00:06 50.042
7 7 00:00:07 50.042
Upvotes: 3
Reputation: 59519
Use numpy
and array slicing
import numpy as np
n = 4
df['fnew'] = np.concatenate([np.repeat(df.f.values[n-1::n], n),
np.repeat(np.NaN, len(df)%n)])
n=3
index dtm f fnew
0 0 00:00:00 50.065 50.058
1 1 00:00:01 50.061 50.058
2 2 00:00:02 50.058 50.058
3 3 00:00:03 50.049 50.044
4 4 00:00:04 50.044 50.044
5 5 00:00:05 50.044 50.044
6 6 00:00:06 50.042 NaN
7 7 00:00:07 50.042 NaN
n = 4
index dtm f fnew
0 0 00:00:00 50.065 50.049
1 1 00:00:01 50.061 50.049
2 2 00:00:02 50.058 50.049
3 3 00:00:03 50.049 50.049
4 4 00:00:04 50.044 50.042
5 5 00:00:05 50.044 50.042
6 6 00:00:06 50.042 50.042
7 7 00:00:07 50.042 50.042
n = 5
index dtm f fnew
0 0 00:00:00 50.065 50.044
1 1 00:00:01 50.061 50.044
2 2 00:00:02 50.058 50.044
3 3 00:00:03 50.049 50.044
4 4 00:00:04 50.044 50.044
5 5 00:00:05 50.044 NaN
6 6 00:00:06 50.042 NaN
7 7 00:00:07 50.042 NaN
Upvotes: 2