Reputation: 11
Suppose I have a Dataframe where some entries are arrays/lists/sets:
Row 0 [a, d, g, h, j]
Row 1 [d, j, a, p]
Row 2 [c, f, t, a]
Row 3 [a, h]
Row 4 []
Row 5 [d, j]
I want to transform it to a df where we have array entries mapped to the total frequency of their appearances (in that column only):
a c d f g h j p
4 1 3 1 1 2 3 1
How would I do that? (need it for bar plots).
Now suppose dataframe has another column:
Row 0 [a, d, g, h, j] 100
Row 1 [d, j, a, p] 300
Row 2 [c, f, t, a] 200
Row 3 [a, h] 160
Row 4 [] 40
Row 5 [d, j] 600
If I wanted to get the average value of col 2 per array entry, how would I do it? to clarify, I want:
a (100 + 300 + 200 + 160)/4 = 760/4 = 190
c 200/1 = 200
d (100 + 300)/2 = 200
f 200
... etc
Thanks, am learning pandas right now and it's really fun but also not quite trivial.
Upvotes: 0
Views: 77
Reputation: 9019
First, let's recreate your dataframe:
import pandas as pd
df = pd.DataFrame({
'Rows': ['Row 0','Row 1','Row 2','Row 3','Row 4','Row 5'],
'Letters': [['a', 'd', 'g', 'h', 'j'], ['d', 'j', 'a', 'p'], ['c', 'f', 't', 'a'], ['a', 'h'], [], ['d', 'j']],
'Values': [100, 300, 200, 160, 40, 600]
})
Yields:
Rows Letters Values
0 Row 0 [a, d, g, h, j] 100
1 Row 1 [d, j, a, p] 300
2 Row 2 [c, f, t, a] 200
3 Row 3 [a, h] 160
4 Row 4 [] 40
5 Row 5 [d, j] 600
Now, we can use explode()
to convert your list
-type column to individual rows and solve your first question:
df['Letters'].explode().value_counts().sort_index()
Yields:
a 4
c 1
d 3
f 1
g 1
h 2
j 3
p 1
t 1
For the second part, I would refer to @GrzegorzSkibinski's answer, which is how I would approach the problem. His solution efficiently combines both questions in your post:
df.explode('Letters').groupby('Letters').agg({'Values': 'mean'})
Yields:
Values
Letters
a 190.000000
c 200.000000
d 333.333333
f 200.000000
g 100.000000
h 130.000000
j 333.333333
p 300.000000
t 200.000000
Upvotes: 1
Reputation: 13387
With the inferred setup:
Setup:
data=[
["Row 0", list("adghj"), 100],
["Row 1", list("djap"), 300],
["Row 2", list("cfta"), 200],
["Row 3", list("ah"), 160],
["Row 4", [], 40],
["Row 5", list("dj"), 600]
]
import pandas as pd
df=pd.DataFrame(data, columns=["id", "wrds", "val"])
Solution for both questions:
df=df.explode("wrds")[["wrds", "val"]].groupby("wrds").agg(["count", "mean"])
Outputs:
val
count mean
wrds
a 4 190.000000
c 1 200.000000
d 3 333.333333
f 1 200.000000
g 1 100.000000
h 2 130.000000
j 3 333.333333
p 1 300.000000
t 1 200.000000
Upvotes: 2