Haiyang Li
Haiyang Li

Reputation: 47

How to loop through pandas dataframe, and conditionally assign values to a row of a variable?

I'm trying to loop through the 'vol' dataframe, and conditionally check if the sample_date is between certain dates. If it is, assign a value to another column.

Here's the following code I have:

vol = pd.DataFrame(data=pd.date_range(start='11/3/2015', end='1/29/2019'))
vol.columns = ['sample_date']
vol['hydraulic_vol'] = np.nan
for i in vol.iterrows():
    if  pd.Timestamp('2015-11-03') <= vol.loc[i,'sample_date'] <= pd.Timestamp('2018-06-07'):
        vol.loc[i,'hydraulic_vol'] = 319779

Here's the error I received: TypeError: 'Series' objects are mutable, thus they cannot be hashed

Upvotes: 1

Views: 4007

Answers (2)

Erfan
Erfan

Reputation: 42926

Another way to do this would be to use the np.where method from the numpy module, in combination with the .between method.

This method works like this:
np.where(condition, value if true, value if false)

Code example

cond = vol.sample_date.between('2015-11-03', '2018-06-07')
vol['hydraulic_vol'] = np.where(cond, 319779, np.nan)

Or you can combine them in one single line of code:

vol['hydraulic_vol'] = np.where(vol.sample_date.between('2015-11-03', '2018-06-07'), 319779, np.nan)

Edit
I see that you're new here, so here's something I had to learn as well coming to python/pandas.

Looping over a dataframe should be your last resort, try to use vectorized solutions, in this case .loc or np.where, these will perform better in terms of speed compared to looping.

Upvotes: 2

gold_cy
gold_cy

Reputation: 14236

This is how you would do it properly:

cond = (pd.Timestamp('2015-11-03') <= vol.sample_date) & 
       (vol.sample_date <= pd.Timestamp('2018-06-07'))

vol.loc[cond, 'hydraulic_vol'] = 319779

Upvotes: 6

Related Questions