Reputation: 107
I have a dataframe which has columns -
cols = group_dataframe.columns
print(cols)
Index(['TEST_TXT', 'count', 'mean', 'std', 'LSL', 'USL', 'median', 'Cp', 'CpK', 'Cpu', 'Cpl', 'min', 'max', '25%',
'50%', '75%'],
dtype='object')
I wish to make a new dataframe with mean of all the rows for certain columns like "mean","std","Cp","Cpu"
but minimum and maximum for "min"
and "max"
column, also leave test_txt
from processing.
My code looks like this -
new_df = pd.DataFrame()
new_df["Group"] = np.asarray(test_group_name)
for col in cols:
if col == "TEST_TXT":
pass
elif col in ["min","max"]:
new_df[col] = np.min(group_dataframe[col].astype(float))
else:
new_df[col] = np.mean(group_dataframe[col].astype(float))
but this doesn't seem to fill dataframe at all. The new dataframe should have only one row, mean of values for a certain column and min/max for others. Can anyone help to find the error(if there is any), or suggest something better to achieve the same?
Upvotes: 1
Views: 82
Reputation: 13666
aggregate
seem to answer you needs:
df = pd.DataFrame(np.random.random((5,4)), columns=['count', 'dummy', 'mean', 'max'])
df.agg({'count': 'mean', 'mean':'mean', 'max':'max'})
Here I create a Dataframe
with 4 columns and aggregate columns of interest with specific function. The result is a Series
count 0.493802
mean 0.532349
max 0.676727
Upvotes: 2
Reputation: 8219
I would first create a dictionary with the averages and then convert it into a DataFrame
res = {}
for col in cols:
if col == "TEST_TXT":
pass
elif col in ["min","max"]:
res[col] = np.min(group_dataframe[col].astype(float))
else:
res[col] = np.mean(group_dataframe[col].astype(float))
new_df = pd.DataFrame(res)
Upvotes: 1