Reputation: 236
I am using the Python Data Analysis Library and have a data table like it follows:
+----------------+
| ID | Frequency |
+----------------+
| AA | 4 |
| BB | 2 |
| CC | 3 |
| DD | 1 |
+----------------+
I would like to generate a result table where each original row occurrence has a multiplicity indicated by the Frequency
field above. The new values for the Frequency
field should be increasing starting from 1
, finishing with the original frequency value.
+----------------+
| ID | Frequency |
+----------------+
| AA | 1 |
| AA | 2 |
| AA | 3 |
| AA | 4 |
| BB | 1 |
| BB | 2 |
| CC | 1 |
| CC | 2 |
| CC | 3 |
| DD | 1 |
+----------------+
As I have no idea how start to solve this issue, any help would be appreciated.
Upvotes: 1
Views: 292
Reputation: 3491
I think there is a built in way to do this; however, I can't find it in the documentation. The following seems to work for me...
>>> import pandas as pd
>>> d = pd.DataFrame([['AA',4],['BB',3],['CC',2],['DD',1]],columns=['id','frequency'])
>>> k=((' '+d.id)*d.frequency).apply(lambda x: pd.Series(x.split(' '))).stack()
>>> df = pd.DataFrame(k[(k.T != '')])
>>> df.reset_index(inplace=True)
>>> df.columns=['l0','frequency','id']
>>> df[['id','frequency']]
id frequency
0 AA 1
1 AA 2
2 AA 3
3 AA 4
4 BB 1
5 BB 2
6 BB 3
7 CC 1
8 CC 2
9 DD 1
Upvotes: 1
Reputation: 518
df = pd.DataFrame({"ID":["AA", "BB", "CC", "DD"], "Frequency": [4, 3, 2, 1]})
l = []
for idx, i in df.iterrows():
x = i["Frequency"]
while x > 0:
d = {}
d.update({"ID":i["ID"], "Frequency":i["Frequency"]-x+1 })
l.append(d)
x-=1
ndf = pd.DataFrame(l)
ndf = ndf[["ID", "Frequency"]]
print(ndf)
ID Frequency
0 AA 1
1 AA 2
2 AA 3
3 AA 4
4 BB 1
5 BB 2
6 BB 3
7 CC 1
8 CC 2
9 DD 1
Upvotes: 1