Reputation: 73
I am trying to do something relatively simple in summing all columns in a pandas dataframe that contain a certain string. Then making that a new column in the dataframe from the sum. These columns are all numeric float values...
I can get the list of columns which contain the string I want
StmCol = [col for col in cdf.columns if 'Stm_Rate' in col]
But when I try to sum them using:
cdf['PadStm'] = cdf[StmCol].sum()
I get a new column full of "nan" values.
Upvotes: 3
Views: 11393
Reputation: 375535
You need to pass in axis=1 to .sum
, by default (axis=0) sums over each column:
In [11]: df = pd.DataFrame([[1, 2], [3, 4]], columns=["A", "B"])
In [12]: df
Out[12]:
A B
0 1 2
1 3 4
In [13]: df[["A"]].sum() # Here I'm passing the list of columns ["A"]
Out[13]:
A 4
dtype: int64
In [14]: df[["A"]].sum(axis=1)
Out[14]:
0 1
1 3
dtype: int64
Only the latter matches the index of df:
In [15]: df["C"] = df[["A"]].sum()
In [16]: df["D"] = df[["A"]].sum(axis=1)
In [17]: df
Out[17]:
A B C D
0 1 2 NaN 1
1 3 4 NaN 3
Upvotes: 3