Arun Rajora
Arun Rajora

Reputation: 59

Pandas with SettingWithCopyWarning

I have a very large dataset(test) of approx 1 million rows. I want to update a column('Date') from the dataset. I just want 3 dates in my 'Date' column:

2014-04-01, 2014-05-01, 2014-06-01

So each date in one row and after every 3rd row dates are repeating.

I have tried this:

for i in range(0,len(test),3):

    if(i <= len(test)):

       test['Date'][i] = '2014-04-01'

       test['Date'][i+1] = '2014-05-01'

       test['Date'][i+2] = '2014-06-01'

I am getting this warning:

__main__:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
__main__:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
__main__:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

I have gone through the link but could not able to solve my issue. And I have googled it, got some solutions like copy() dataset before slicing and some others but nothing worked.

Upvotes: 0

Views: 57

Answers (2)

gmds
gmds

Reputation: 19885

I believe what you want is np.tile:

from math import ceil

dates = pd.Series(['2014-04-01', '2014-05-01', '2014-06-01'], dtype='datetime64[ns]')

repeated_dates = np.tile(dates, len(df) // 3 + 1)[:len(df)]

df['dates'] = repeated_dates

This creates a Series containing repeated values and assigns it to a column of your dataframe.

Upvotes: 2

anky
anky

Reputation: 75080

You can also look at itertools islice and cycle which allows you to cycle the list or series across the length of the dataframe.:

dates = pd.Series(['2014-04-01', '2014-05-01', '2014-06-01'], dtype='datetime64[ns]')
df = pd.DataFrame(np.random.randint(0,50,50).reshape(10,5))

from itertools import islice,cycle
df['dates'] = list(islice(cycle(dates),len(df)))
print(df)

    0   1   2   3   4      dates
0  45   3  13  24  13 2014-04-01
1  30  44   6  17  24 2014-05-01
2  47  22  16  28  12 2014-06-01
3  11  13  10   0  47 2014-04-01
4  32  12  49  14   2 2014-05-01
5  15   6  21  17  49 2014-06-01
6  49  49  28  18   9 2014-04-01
7  18  35  35  40   7 2014-05-01
8  44  15  13  49  28 2014-06-01
9   9  14  36  36   6 2014-04-01

Upvotes: 1

Related Questions