Osca
Osca

Reputation: 1694

Set index for aggregated dataframe

I did some calculation to a list of dataframes. I'd like the result dataframe uses rangeindex. However, it uses one of the column name as index, even I set index=None

d1 = {'id': [1, 2, 3, 4, 5], 'is_free': [True, False, False, True, True], 'level': ['Top', 'Mid', 'Top', 'Top', 'Low']}
d2 = {'id': [1, 3, 4, 5, 7], 'is_free': [True, True, False, False, False], 'level': ['Top', 'High', 'Top', 'Top', 'Low']}
d1 = pd.DataFrame(data=d1)
d2 = pd.DataFrame(data=d2)
df_list = [d1, d2]

dfs = []
for i, df in enumerate(df_list):
    df = df.groupby('is_free')['id'].count()
    dfs.append(df)
    df = pd.DataFrame(data=dfs, index=None)

It returns

is_free False   True
id      2       3
id      3       2

df.index returns

Index(['id', 'id'], dtype='object')

Upvotes: 1

Views: 33

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150765

From your code:

 df = pd.DataFrame(data=dfs, index=None).reset_index(drop=True)

However, in general, I would avoid append iteratively. Try concat:

pd.concat({i:d.groupby('is_free')['id'].count() 
           for i,d in enumerate(df_list)}, 
          axis=1).T

Or use pd.DataFrame:

pd.DataFrame({i:d.groupby('is_free')['id'].count() 
           for i,d in enumerate(df_list)}).T

Output:

is_free  False  True 
0            2      3
1            3      2

Upvotes: 1

Related Questions