user1912925
user1912925

Reputation: 781

Multiply dataframe rows based on a datetime condition

I have the following example dataframe:

df = pd.DataFrame(columns=['Total_concentration'], index=pd.to_datetime([]))
df.loc[pd.Timestamp(year=2017, month=1, day=1, hour=12)] = pd.Series([1], ['Total_concentration'])
df.loc[pd.Timestamp(year=2017, month=1, day=2, hour=12)] = pd.Series([2], ['Total_concentration'])
df.loc[pd.Timestamp(year=2017, month=1, day=3, hour=12)] = pd.Series([3], ['Total_concentration'])
df.loc[pd.Timestamp(year=2017, month=1, day=4, hour=12)] = pd.Series([1], ['Total_concentration'])
df.loc[pd.Timestamp(year=2017, month=1, day=5, hour=12)] = pd.Series([2], ['Total_concentration'])
df.loc[pd.Timestamp(year=2017, month=1, day=6, hour=12)] = pd.Series([3], ['Total_concentration'])

I would like to add a column "Total_flux" which is the "Total_concentration" column multiplied by 2 between the dates 2017-01-02 and 2017-01-03 and the "Total_concentration" column multiplied by 3 between the dates 2017-01-04 and 2017-01-05. I.e. the resulting dataframe would look like the following:

df2 = pd.DataFrame(columns=['Total_concentration','Total_flux'], index=pd.to_datetime([]))
df2.loc[pd.Timestamp(year=2017, month=1, day=1, hour=12)] = pd.Series([1,1], ['Total_concentration','Total_flux'])
df2.loc[pd.Timestamp(year=2017, month=1, day=2, hour=12)] = pd.Series([2,4], ['Total_concentration','Total_flux'])
df2.loc[pd.Timestamp(year=2017, month=1, day=3, hour=12)] = pd.Series([3,6], ['Total_concentration','Total_flux'])
df2.loc[pd.Timestamp(year=2017, month=1, day=4, hour=12)] = pd.Series([1,3], ['Total_concentration','Total_flux'])
df2.loc[pd.Timestamp(year=2017, month=1, day=5, hour=12)] = pd.Series([2,6], ['Total_concentration','Total_flux'])
df2.loc[pd.Timestamp(year=2017, month=1, day=6, hour=12)] = pd.Series([3,3], ['Total_concentration','Total_flux'])

What is the most effective way to do this?

Upvotes: 1

Views: 848

Answers (2)

Valdi_Bo
Valdi_Bo

Reputation: 30991

Use the following code:

df['Total_flux'] = df.Total_concentration.multiply(pd.cut(df.index.to_series(),
    [pd.Timestamp('2017-01-02'), pd.Timestamp('2017-01-04'), pd.Timestamp('2017-01-06')],
    right=False, labels=[2, 3]).dropna().astype(int), fill_value=1)

The idea to compute the new column is to:

  • cut the index of df into bins:
    • between 2017-01-02 and 2017-01-04, with label 2,
    • between 2017-01-04 and 2017-01-06, with label 3 (both ranges open at the right edge),
  • convert the (Categorical) result to int,
  • multiply Total_concentration by the above result with fill_value of 1.

Upvotes: 1

jezrael
jezrael

Reputation: 863056

Solution with slicing DataetimeIndex, then missing values are replaced by original column:

df.loc['2017-01-02':'2017-01-03', 'Total_flux'] = df.loc['2017-01-01':'2017-01-03', 'Total_concentration'] * 2
df.loc['2017-01-04':'2017-01-05', 'Total_flux'] = df.loc['2017-01-04':'2017-01-05', 'Total_concentration'] * 3

df['Total_flux'] = df['Total_flux'].fillna(df['Total_concentration'])
print (df)
                    Total_concentration  Total_flux
2017-01-01 12:00:00                   1           1
2017-01-02 12:00:00                   2           4
2017-01-03 12:00:00                   3           6
2017-01-04 12:00:00                   1           3
2017-01-05 12:00:00                   2           6
2017-01-06 12:00:00                   3           3

Or is possible use numpy.select for values used for multiple with DatetimeIndex.normalize and Series.between:

s = df.index.normalize().to_series()
df['Total_flux'] = (np.select([s.between('2017-01-02','2017-01-03'),
                               s.between('2017-01-04','2017-01-05')],
                               [2,3], default=1) * df['Total_concentration'])
print (df)
                    Total_concentration Total_flux
2017-01-01 12:00:00                   1          1
2017-01-02 12:00:00                   2          4
2017-01-03 12:00:00                   3          6
2017-01-04 12:00:00                   1          3
2017-01-05 12:00:00                   2          6
2017-01-06 12:00:00                   3          3

Upvotes: 1

Related Questions