Reputation: 2007
I would like to perform some arithmetic calculations on columns, where I know only first character (number), which is common for some columns. As an output I would need to create another data frame
with the names that include the same character (number).
For example. I have a df1
with 5 columns, where some starts with number 1, another with 2 and so on. I would need to sum (multiply or any other function) all with 1,2 and create df2
with columns var1
, var2
(numbers corresponds to df1
)
MWE:
import pandas as pd
import numpy as np
index=pd.date_range('2014-1-1 00:00:00', '2014-12-31 23:50:00', freq='1h')
df1=pd.DataFrame(np.random.randn(len(index),5).cumsum(axis=0),columns=['1A','1B','C','2D','2E'],index=index)
My idea was to create an empty df2
and loop through df
applaying df.filter(regex=(i))
function but I don't know how to create new column names based on i
number.
df2=[]
for i in df1:
df_fil=[]
df_fil=df.filter(regex=(i))
df2['var'+i]=df_fil.sum()
I would be grateful for any suggestions.
EDIT
The desired output:
var1 var2 var3
2014-01-01 1A+1B+1D 2A+2B+2D 3A+3B+3D
2014-01-02 ....and so on.....
where the math function is performed for every row.
Upvotes: 1
Views: 69
Reputation: 37103
Here's a somewhat different solution.
df2 = [np.sum(df1[[h for h in df1.columns.values
if h.startswith(c)]], axis=1) for c in '123']
result = pd.DataFrame(df2, index=["var%s" % c for c in '123']).transpose()
It seems a little cumbersome even to me :)
Upvotes: 0
Reputation: 30434
I'm going to alter the MWE just a little bit, partially for brevity, and partially to have differing numbers of column per number:
index=pd.date_range('2014-1-1', '2014-1-10', freq='1D')
df1=pd.DataFrame(np.random.randn(len(index),6).cumsum(axis=0),
columns=['1A','1B','2C','2D','2E','3F'],index=index)
1A 1B 2C 2D 2E 3F
2014-01-01 1.614124 0.794855 -0.417129 0.219520 -1.488025 0.174294
2014-01-02 1.802256 0.004414 -2.286057 0.432783 -0.104455 1.194493
2014-01-03 2.603609 0.182026 -2.763507 -0.358307 -0.453031 2.188724
2014-01-04 2.661041 -0.911023 -3.193927 -0.541110 0.064825 2.263758
2014-01-05 2.784960 -2.114531 -3.206907 0.142064 -0.554215 1.726185
2014-01-06 1.162722 -1.841800 -2.940212 -0.909338 -1.327187 0.655045
2014-01-07 3.904791 -0.583724 -2.696215 0.560302 -2.839679 1.714807
2014-01-08 5.010886 0.528935 -2.815885 1.134565 -4.262055 -0.019529
2014-01-09 4.541217 -1.105712 -3.432948 1.110232 -5.955953 -0.859368
2014-01-10 4.031846 0.021889 -3.953261 0.958110 -6.615558 -1.304860
df2=pd.DataFrame( index=df1.index )
for i in list('123'):
df_fil=df1.filter(regex=i)
df2['var'+i] = df_fil.sum(axis=1)
var1 var2 var3
2014-01-01 2.408979 -1.685633 0.174294
2014-01-02 1.806670 -1.957730 1.194493
2014-01-03 2.785634 -3.574845 2.188724
2014-01-04 1.750019 -3.670212 2.263758
2014-01-05 0.670429 -3.619058 1.726185
2014-01-06 -0.679078 -5.176737 0.655045
2014-01-07 3.321067 -4.975593 1.714807
2014-01-08 5.539821 -5.943374 -0.019529
2014-01-09 3.435505 -8.278669 -0.859368
2014-01-10 4.053735 -9.610709 -1.304860
Upvotes: 3