cording12
cording12

Reputation: 166

Pandas explode dictionary to row while maintaining multi-index

Having now checked a multitude of Stack Overflow threads on this, I'm struggling to apply the answers to my particular use case so hoping someone can help me on my specific problem.

I'm trying to explode data out of a dictionary into two separate columns while maintaining a multi-index.

Here is what I currently have:

| short_url | platform | css_problem_files                                                    |
|-----------|----------|----------------------------------------------------------------------|
| /url_1/   | desktop  | {css_file_1: css_value, css_file_2: css_value, css_fle_3: css_value} |
|           | mobile   | {css_file_1: css_value, css_file_2: css_value, css_fle_3: css_value} |
| /url_2/   | desktop  | {css_file_1: css_value, css_file_2: css_value, css_fle_3: css_value} |
|           | mobile   | {css_file_1: css_value, css_file_2: css_value, css_fle_3: css_value} |

and here is what I would like to achieve:

| short_url | platform | css_file   | css_value |
|-----------|----------|------------|-----------|
| /url_1/   | desktop  | css_file_1 | css_value |
|           |          | css_file_2 | css_value |
|           |          | css_file_3 | css_value |
|           | mobile   | css_file_1 | css_value |
|           |          | css_file_2 | css_value |
|           |          | css_file_3 | css_value |
| /url_2/   | desktop  | css_file_1 | css_value |
|           |          | css_file_2 | css_value |
|           |          | css_file_3 | css_value |
|           | mobile   | css_file_1 | css_value |
|           |          | css_file_2 | css_value |
|           |          | css_file_3 | css_value |

The only thing I've come up with that's remotely close to what I need is the below, however this is creating over 200K rows when I'd expect it to be only in the thousands (and I've not included platform yet):

m = pd.DataFrame([*df['css_problem_files']], df.index).stack()\
      .rename_axis([None,'css_files']).reset_index(1, name='pct usage')

out = df[['short_url']].join(m)

Any assistance or a point in the right direction would be greatly appreciated

Upvotes: 0

Views: 701

Answers (2)

ifly6
ifly6

Reputation: 5331

It was annoying to parse your data frame from a Markdown table. It would have been much easier if it were in a DataFrame constructor.

df = pd.read_csv(io.StringIO('''short_url | platform | css_problem_files                                                   
/url_1/   | desktop  | {'css_file_1': 'css_value', 'css_file_2': 'css_value', 'css_fle_3': 'css_value'}
/url_1/   | mobile   | {'css_file_1': 'css_value', 'css_file_2': 'css_value', 'css_fle_3': 'css_value'}
/url_2/   | desktop  | {'css_file_1': 'css_value', 'css_file_2': 'css_value', 'css_fle_3': 'css_value'}
/url_2/   | mobile   | {'css_file_1': 'css_value', 'css_file_2': 'css_value', 'css_fle_3': 'css_value'}'''), sep=r'\s+\|\s+', index_col=[0, 1])
df['css_problem_files'] = df['css_problem_files'].apply(eval)

pd.DataFrame.apply(pd.Series) is the most straightforward way to make this happen, though it is really inefficient (see answer for illustration thereof). This uses the DataFrame constructor directly, which avoids that performance issue. It preserves the index by copying it over to the new data frame and provides columns explicitly:

temp_df = df['css_problem_files'].apply(dict.items).explode()
pd.DataFrame(temp_df.to_list(), index=temp_df.index, columns=['css_file', 'css_value'])
                      css_file  css_value
short_url platform                       
/url_1/   desktop   css_file_1  css_value
          desktop   css_file_2  css_value
          desktop    css_fle_3  css_value
          mobile    css_file_1  css_value
          mobile    css_file_2  css_value
          mobile     css_fle_3  css_value
/url_2/   desktop   css_file_1  css_value
          desktop   css_file_2  css_value
          desktop    css_fle_3  css_value
          mobile    css_file_1  css_value
          mobile    css_file_2  css_value
          mobile     css_fle_3  css_value

Upvotes: 0

fsimonjetz
fsimonjetz

Reputation: 5802

If you turn the dictionaries into lists of key-value pairs, you can explode them and then transform the result into two new columns with .apply(pd.Series) (and rename them to your liking) like so:

df = (df
      .css_problem_files.apply(dict.items) # turn into key value list
      .explode() # explode
      .apply(pd.Series) # turn into columns
      .rename(columns={0: "css_file", 1: "css_value"}) # rename
      )

Upvotes: 2

Related Questions