Reputation: 791
Below is an example dataframe
date_list = ['2014-01-01','2014-02-01','2014-03-01']
value_list = [100.00,98.00,102.00]
df = pd.DataFrame({'DATE' : date_list, 'VALUE' : value_list})
The VALUE
column is total monthly number and I want to change it to an average daily number
I can do it using this logic:
for x in range(len(df)):
if str(df.ix[x,0])[5:7] in ['01','03','05','07','08','10','12']:
df.ix[x,1] = df.ix[x,1]/31
# etc. for other months
It works but takes too much time.
If I go to http://pandas.pydata.org/pandas-docs/stable/cookbook.html#idioms, there are more elegant ways on how I can edit one column based on another
df.ix[df.AAA >= 5,'BBB'] = -1
, for example.
But I can't figure out how I can apply it in my case.
Upvotes: 1
Views: 211
Reputation: 10349
how I can edit one column based on another
The first step is to subset the data frame to the rows you want to apply changes to. For this you create a boolean mask, which contains for every row a True/False value. True means the row is selected.
mask = df['DATE'].map(lambda v: v[5:7] in ['01', '02', '03', '05', '07', '08', '10', '12']
Then apply the mask and update the value in one step
df.loc[mask, 'VALUE'] /= 31
If you convert the 'DATE' column into a datetime value, the code becomes even more compact:
df['DATE'] = pd.to_datetime(df['DATE'])
df.loc[df.DATE.dt.month.isin([1, 2, 3, 5, 7, 8, 10, 12]), 'VALUE'] /= 31
Pandas will use vectorized operations. Comparing your for loop with the .loc variant shows a 30% performance gain on my CPU.
Upvotes: 1
Reputation:
If you convert the column to datetime, you can use the dt accessor for vectorized operations. df['DATE'].dt.daysinmonth
returns the number of days in that particular month, for example.
df['DATE'] = pd.to_datetime(df['DATE'])
df['VALUE'] / df['DATE'].dt.daysinmonth
Out:
0 3.225806
1 3.500000
2 3.290323
dtype: float64
Upvotes: 4