Reputation: 689
I am in the process of doing some data wrangling and I have come across the following problem.
In my CSV file (that I read through pandas) I have a column called effects. effects look like this:
effects
Row 1: Euphoric 52% Creative 42% Happy 40% ...
Row 2: Relaxed 95% Happy 90% Euphoric 60% ...
Row 3: Happy 64% Uplifted 48% Relaxed 48% ...
and so on and so forth. now there is another column called name that looks like this:
name
Row 1: Tommy
Row 2: Susan
Row 3: Bruce
What I am trying to do is map each key/value in the effects column to a unique column that corresponds to their appropriate name.
So essentially it would look like this:
name euphoric creative happy
Row 1: Tommy 52% 42% 40%
Row 2: Susan 60% n/a 90%
Row 3: Bruce n/a n/a 64%
So you can see each effect with its corresponding percentage becomes a unique column for every name (and if the data is not present it's n/a or null).
I am assuming the answer may include converting the effects object into a list of dictionaries and perhaps some excel text to column action on top to make the mapping easier but I haven't been able to do it successfully.
any help here would be immensely appreciated.
Upvotes: 1
Views: 398
Reputation: 7635
Another solution using for loop looks like this:
effects_data = []
for e in df.effects:
effects_data.append(
{
key: val for key, val in zip(e.split(' ')[0::2], e.split(' ')[1::2])
}
)
print(df[['name']].join(pd.DataFrame(effects_data)))
The result is following:
name Euphoric Creative Happy Relaxed Uplifted
0 Tommy 52% 42% 40% NaN NaN
1 Susan 60% NaN 90% 95% NaN
2 Bruce NaN NaN 64% 48% 48%
Upvotes: 1
Reputation: 150825
You can use str.extract
like this:
df[['name']].join(df.effects.str.extractall('(\w+) (\d+\%)')
.reset_index()
.pivot(index='level_0', columns=0, values=1)
)
Output:
name Creative Euphoric Happy Relaxed Uplifted
Row 1: Tommy 42% 52% 40% NaN NaN
Row 2: Susan NaN 60% 90% 95% NaN
Row 3: Bruce NaN NaN 64% 48% 48%
Upvotes: 1