Reputation: 59
I have a very large dataset(test) of approx 1 million rows. I want to update a column('Date') from the dataset. I just want 3 dates in my 'Date' column:
2014-04-01, 2014-05-01, 2014-06-01
So each date in one row and after every 3rd row dates are repeating.
I have tried this:
for i in range(0,len(test),3):
if(i <= len(test)):
test['Date'][i] = '2014-04-01'
test['Date'][i+1] = '2014-05-01'
test['Date'][i+2] = '2014-06-01'
I am getting this warning:
__main__:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
__main__:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
__main__:5: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
I have gone through the link but could not able to solve my issue. And I have googled it, got some solutions like copy() dataset before slicing and some others but nothing worked.
Upvotes: 0
Views: 57
Reputation: 19885
I believe what you want is np.tile
:
from math import ceil
dates = pd.Series(['2014-04-01', '2014-05-01', '2014-06-01'], dtype='datetime64[ns]')
repeated_dates = np.tile(dates, len(df) // 3 + 1)[:len(df)]
df['dates'] = repeated_dates
This creates a Series
containing repeated values and assigns it to a column of your dataframe.
Upvotes: 2
Reputation: 75080
You can also look at itertools
islice
and cycle
which allows you to cycle the list or series across the length of the dataframe.:
dates = pd.Series(['2014-04-01', '2014-05-01', '2014-06-01'], dtype='datetime64[ns]')
df = pd.DataFrame(np.random.randint(0,50,50).reshape(10,5))
from itertools import islice,cycle
df['dates'] = list(islice(cycle(dates),len(df)))
print(df)
0 1 2 3 4 dates
0 45 3 13 24 13 2014-04-01
1 30 44 6 17 24 2014-05-01
2 47 22 16 28 12 2014-06-01
3 11 13 10 0 47 2014-04-01
4 32 12 49 14 2 2014-05-01
5 15 6 21 17 49 2014-06-01
6 49 49 28 18 9 2014-04-01
7 18 35 35 40 7 2014-05-01
8 44 15 13 49 28 2014-06-01
9 9 14 36 36 6 2014-04-01
Upvotes: 1