tandem
tandem

Reputation: 2238

pandas dataframe - columns with selected start string and specified columns

In the given dataframe below, I would like to sanitise the counters (MEM_TRANS_RETIRED, for instance) and add to the dataframe the columns load,rps,th95

       load        rps     th95    energy     ...       MEM_TRANS_RETIRED-34  PERF_COUNT_HW_CACHE_L1D-34  PERF_COUNT_HW_CACHE_L1I-34     map_freq
0    500.0k  346222.62    12.62   7270.22     ...                  154287.14                   591053.74                2.918521e+07   6C-1.70GHz
1    400.0k  402628.34     2.25  12026.40     ...                  189915.07                   627043.91                2.867945e+07  10C-2.10GHz
2    500.0k  283508.27    15.52   5662.74     ...                  140790.31                  1431892.98                4.253950e+07   6C-1.30GHz

this is how I am doing it

self.unique_counters = [x[:-2] for x in self.dfile_keys[6:] if x.endswith('-0')]
for counter in self.unique_counters:
 new = self.dfile.loc[:, self.dfile.columns.str.startswith(counter)]

This, however, only gives the selected columns without the additional columns mentioned above.

     PERF_COUNT_HW_CPU_CYCLES-0  PERF_COUNT_HW_CPU_CYCLES-2             ...               PERF_COUNT_HW_CPU_CYCLES-32  PERF_COUNT_HW_CPU_CYCLES-34
0                  6.020913e+08                6.021277e+08             ...                              5.109342e+06                 2.556039e+06
1                  4.781879e+08                4.783621e+08             ...                              3.095814e+06                 2.795868e+06
2                  4.841784e+08                4.844846e+08             ...                              2.389396e+06                 5.550159e+06

How do I get both the counter with a selected start string and some specified columns This is the expected output

       load        rps     th95    energy   PERF_COUNT_HW_CPU_CYCLES-0  PERF_COUNT_HW_CPU_CYCLES-2             ...               PERF_COUNT_HW_CPU_CYCLES-32  PERF_COUNT_HW_CPU_CYCLES-34
0    500.0k  346222.62    12.62   7270.22   6.020913e+08                6.021277e+08             ...                              5.109342e+06                 2.556039e+06
1    400.0k  402628.34     2.25  12026.40   4.781879e+08                4.783621e+08             ...                              3.095814e+06                 2.795868e+06
2    500.0k  283508.27    15.52   5662.74   4.841784e+08                4.844846e+08             ...                              2.389396e+06                 5.550159e+06

Upvotes: 1

Views: 27

Answers (1)

jezrael
jezrael

Reputation: 863226

I believe you need for one new DataFrame:

L = [x[:-2] for x in self.dfile_keys[6:] if x.endswith('-0')]

new = self.dfile.loc[:, self.dfile.columns.str.startswith(tuple(L))]

df = pd.concat([self.dfile[['load','rps','th95','energy']], new], axis=1)

Or in list comprehension create list of DataFrames:

self.unique_counters = [x[:-2] for x in self.dfile_keys[6:] if x.endswith('-0')]
dfs = [self.dfile.loc[:, self.dfile.columns.str.startswith(counter)] 
       for counter in self.unique_counters]

df = pd.concat([self.dfile[['load','rps','th95','energy']], dfs], axis=1)

Upvotes: 1

Related Questions