will
will

Reputation: 689

Python Pandas - Map each key & value to a unique column

I am in the process of doing some data wrangling and I have come across the following problem.

In my CSV file (that I read through pandas) I have a column called effects. effects look like this:

       effects
Row 1: Euphoric 52% Creative 42% Happy 40% ...
Row 2: Relaxed 95% Happy 90% Euphoric 60% ...
Row 3: Happy 64% Uplifted 48% Relaxed 48% ...

and so on and so forth. now there is another column called name that looks like this:

       name
Row 1: Tommy
Row 2: Susan
Row 3: Bruce

What I am trying to do is map each key/value in the effects column to a unique column that corresponds to their appropriate name.

So essentially it would look like this:

       name  euphoric  creative   happy
Row 1: Tommy  52%       42%        40%
Row 2: Susan  60%       n/a        90%
Row 3: Bruce  n/a       n/a        64%

So you can see each effect with its corresponding percentage becomes a unique column for every name (and if the data is not present it's n/a or null).

I am assuming the answer may include converting the effects object into a list of dictionaries and perhaps some excel text to column action on top to make the mapping easier but I haven't been able to do it successfully.

any help here would be immensely appreciated.

Upvotes: 1

Views: 398

Answers (2)

Jaroslav Bezděk
Jaroslav Bezděk

Reputation: 7635

Another solution using for loop looks like this:

effects_data = []

for e in df.effects:
    effects_data.append(
        {
            key: val for key, val in zip(e.split(' ')[0::2], e.split(' ')[1::2])
        }
    )
    
print(df[['name']].join(pd.DataFrame(effects_data)))

The result is following:

    name Euphoric Creative Happy Relaxed Uplifted
0  Tommy      52%      42%   40%     NaN      NaN
1  Susan      60%      NaN   90%     95%      NaN
2  Bruce      NaN      NaN   64%     48%      48%

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150825

You can use str.extract like this:

df[['name']].join(df.effects.str.extractall('(\w+) (\d+\%)')
                    .reset_index()
                    .pivot(index='level_0', columns=0, values=1)
                 )

Output:

         name Creative Euphoric Happy Relaxed Uplifted
Row 1:  Tommy      42%      52%   40%     NaN      NaN
Row 2:  Susan      NaN      60%   90%     95%      NaN
Row 3:  Bruce      NaN      NaN   64%     48%      48%

Upvotes: 1

Related Questions