Reputation: 420
It will be the easiest to explain starting with a sample of the dataframe:
TimeStamp 382.098 382.461 383.185 383.548
10:28:00 0.012448 0.012362 0.0124485 0.012362
10:30:00 0.0124135 0.0123965 0.0124135 0.012431
10:32:00 0.0551035 0.0551725 0.055931 0.0563105
10:34:00 0.055586 0.0557245 0.056655 0.0569485
10:36:00 0.055586 0.055776 0.0568105 0.057362
I want my output to be:
TimeStamp 382 383
10:28:00 0.012405 0.01240525
10:30:00 0.012405 0.01242225
10:32:00 0.05513 0.05612075
10:34:00 0.05565525 0.05680175
10:36:00 0.055681 0.05708625
So, I want to look at the column name values and if they are the same up to whole numbers, I want the output col to have the mean of the values for each time index value.
My idea was to use df.round to round the column headers to the nearest whole number and then to use .mean() to somehow apply the mean on axis = 0 for same col headers. But, I get an error using the round function on dataframe index type.
EDIT: based on the answers, I used
df.rename(columns=dict(zip(df.columns[0:], df.columns[0:]\
.values.astype(float).round().astype(str))),inplace=True)
df = df.groupby(df.columns[0:], axis=1).mean()
And it messes up the column names as well as the values instead of giving me the mean based on col names...no idea why!
Upvotes: 14
Views: 1377
Reputation: 571
Generalised solution
df = pd.DataFrame({383.045:[1,2], 383.96:[3,4], 383.78:[5,5], 343:[9,11]})
df.columns = [int(i) for i in df.columns]
for i in set(df.columns):
if len(df[i].shape) == 2:
mean = df[i].T.sum()/float(df[i].shape[1])
df = df.drop([i],1)
df[i] = mean
Upvotes: 3
Reputation: 109546
To round the column values to the nearest integer, you can group on a list comprehension that rounds each column (barring the first which is TimeStamp
) to the nearest whole number and then takes the integer:
>>> (df
.set_index('TimeStamp')
.groupby([int(round(col, 0)) for col in df.columns[1:].astype(float)], axis=1)
.mean())
382 383 384
TimeStamp
10:28:00 0.012405 0.012448 0.012362
10:30:00 0.012405 0.012414 0.012431
10:32:00 0.055138 0.055931 0.056310
10:34:00 0.055655 0.056655 0.056948
10:36:00 0.055681 0.056810 0.057362
Upvotes: 2
Reputation: 30605
Another method is via pd.to_numeric
, just a slight variant of @coldspeed's answer i.e
df = df.set_index('TimeStamp')
df.groupby(pd.to_numeric(df.columns).astype(int),1).mean()
382 383
TimeStamp
10:28:00 0.012405 0.012405
10:30:00 0.012405 0.012422
10:32:00 0.055138 0.056121
10:34:00 0.055655 0.056802
10:36:00 0.055681 0.057086
Upvotes: 3
Reputation: 323306
with np.floor
rename
and groupby
df.rename(columns=dict(zip(df.columns[1:], np.floor(df.columns[1:].values.astype(float)).astype(str))),inplace=True)
df.set_index('TimeStamp').groupby(level=0,axis=1).mean().reset_index()
Out[171]:
TimeStamp 382.0 383.0
0 10:28:00 0.012405 0.012405
1 10:30:00 0.012405 0.012422
2 10:32:00 0.055138 0.056121
3 10:34:00 0.055655 0.056802
4 10:36:00 0.055681 0.057086
Upvotes: 5
Reputation: 402603
Use groupby
along the 1st axis with a lambda
.
df.set_index('TimeStamp', inplace=True)
df.groupby(by=lambda x: int(x.split('.')[0]), axis=1).mean()
382 383
TimeStamp
10:28:00 0.012405 0.012405
10:30:00 0.012405 0.012422
10:32:00 0.055138 0.056121
10:34:00 0.055655 0.056802
10:36:00 0.055681 0.057086
Upvotes: 13
Reputation: 21274
Rename columns with type conversion, move TimeStamp
to index, and then use groupby
to get column means:
df.rename(columns=lambda x: int(float(x)) if x!="TimeStamp" else x, inplace=True)
df.set_index("TimeStamp", inplace=True)
df
382 382 383 383
TimeStamp
10:28:00 0.012448 0.012362 0.012448 0.012362
10:30:00 0.012414 0.012396 0.012414 0.012431
10:32:00 0.055103 0.055172 0.055931 0.056310
10:34:00 0.055586 0.055725 0.056655 0.056948
10:36:00 0.055586 0.055776 0.056810 0.057362
df.groupby(df.columns, axis=1).mean()
382 383
TimeStamp
10:28:00 0.012405 0.012405
10:30:00 0.012405 0.012422
10:32:00 0.055138 0.056121
10:34:00 0.055655 0.056802
10:36:00 0.055681 0.057086
Upvotes: 6