Reputation: 13
I have this file with 19 columns of mixed dtypes. One of the column Names contain elements which are separated by space. For example:
Col1 Col2
adress1 x
adress2 a b
adress3 x c
adress4 a x d
What I want to do is go over Col2 and find out how many times each element occurs and put the result in a new column along with its corresponding in Col1
Note the above columns were already processed as a Dataframe. I have this which somewhat give me the results but not what I want ultimately.
new_df = pd.Dataframe(old_df.Col2.str.split(' ').tolist(), index=old_df.Col1).stack
How do I put the results in a new column (replacing Col2) and also have the remaining columnS?
Something like:
Col1 Col2 Col3
adress1 x something
adress2 a something1
adress2 b something1
adress3 x NaN
adress3 c NaN
Also calculate occurrence of items in Col2?
Upvotes: 0
Views: 56
Reputation: 323226
We can do split
first then do explode
s=df.assign(Col2=df.Col2.str.split()).explode('Col2')
s=s.groupby(['Col1','Col2']).size().to_frame('count').reset_index()
Out[48]:
Col1 Col2 count
0 adress1 x 1
1 adress2 a 1
2 adress2 b 1
3 adress3 c 1
4 adress3 x 1
5 adress4 a 1
6 adress4 d 1
7 adress4 x 1
Upvotes: 1