Reputation:
I have data
mm_id 11.02.2016 12.02.2016 13.02.2016 14.02.2016 15.02.2016 16.02.2016 17.02.2016
b2b5b93270cb217310b0719946b0afee 0 0 0 242 0 0 0
71f464119a2d43834c3be0af362487bc 0 23 0 0 0 0 0
afad5e5f9c3ba9df885eba1557ab7c73 1 0 0 22 0 0 0
And I need to group it with data. But I don't know how I can transfer to another format.
Desire output (I need to get list of unique ID
to every data
):
day
2016-03-01 [00051f002f5a0c179d7ce191ca2c6401, 00102b98bd9, ...]
2016-03-02 [00102b98bd9e71da3cf23fd1f599408d, 0012ea90a6d, ...]
2016-03-03 [00051f002f5a0c179d7ce191ca2c6401, 00102b98bd9, ...]
Upvotes: 1
Views: 124
Reputation: 16241
First step: transform the dataframe
df = df.set_index('mm_id', drop=True).transpose()
Output:
mm_id b2b5b93270cb217310b0719946b0afee \
11.02.2016 0
12.02.2016 0
13.02.2016 0
14.02.2016 242
15.02.2016 0
16.02.2016 0
17.02.2016 0
mm_id 71f464119a2d43834c3be0af362487bc afad5e5f9c3ba9df885eba1557ab7c73
11.02.2016 0 1
12.02.2016 23 0
13.02.2016 0 0
14.02.2016 0 22
15.02.2016 0 0
16.02.2016 0 0
17.02.2016 0 0
Second step: apply a function
In [10]: df.apply(lambda row: row[row != 0].index.tolist(), axis=1)
Out[10]:
11.02.2016 ['afad5e5f9c3ba9df885eba1557ab7c73']
12.02.2016 ['71f464119a2d43834c3be0af362487bc']
13.02.2016 []
14.02.2016 ['b2b5b93270cb217310b0719946b0afee', 'afad5e5f...
15.02.2016 []
16.02.2016 []
17.02.2016 []
This gets the id of users with non-zero value.
Upvotes: 1
Reputation: 31662
IIUC you could do following with applymap
, numpy broadcasting
, pd.melt
, and groupby
:
In [94]: df
Out[94]:
mm_id 11.02.2016 12.02.2016 13.02.2016 14.02.2016 15.02.2016 16.02.2016 17.02.2016
0 b2b5b93270cb217310b0719946b0afee 0 0 0 242 0 0 0
1 71f464119a2d43834c3be0af362487bc 0 23 0 0 0 0 0
2 afad5e5f9c3ba9df885eba1557ab7c73 1 0 0 22 0 0 0
# values from dates
df_values = df.iloc[:, 1:]
# first change numbers to dummies for dates, i.e. 1 for non zero elements
df2 = df_values.applymap(lambda x: 1 if x >= 1 else 0)
# fill data with mm_id
df3 = pd.DataFrame(df2.values * df.mm_id[:, None], columns=df2.columns)
# melt all together
df4 = pd.melt(df3, var_name='date', value_name='id')
# fill empty strings with NaN to drop in the final groupby
df4 = df4.replace('', np.nan)
# final groupby by date, select id column, and then apply pd.Sereis with dropna and get unique values
df5 = df4.groupby('date')['id'].apply(lambda x: pd.Series(x).dropna().unique())
Results from each step:
In [133]: df_values
Out[133]:
11.02.2016 12.02.2016 13.02.2016 14.02.2016 15.02.2016 16.02.2016 17.02.2016
0 0 0 0 242 0 0 0
1 0 23 0 0 0 0 0
2 1 0 0 22 0 0 0
In [146]: df2
Out[146]:
11.02.2016 12.02.2016 13.02.2016 14.02.2016 15.02.2016 16.02.2016 17.02.2016
0 0 0 0 1 0 0 0
1 0 1 0 0 0 0 0
2 1 0 0 1 0 0 0
In [147]: df3
Out[147]:
11.02.2016 12.02.2016 13.02.2016 14.02.2016 15.02.2016 16.02.2016 17.02.2016
0 b2b5b93270cb217310b0719946b0afee
1 71f464119a2d43834c3be0af362487bc
2 afad5e5f9c3ba9df885eba1557ab7c73 afad5e5f9c3ba9df885eba1557ab7c73
In [148]: df4
Out[148]:
date id
0 11.02.2016 NaN
1 11.02.2016 NaN
2 11.02.2016 afad5e5f9c3ba9df885eba1557ab7c73
3 12.02.2016 NaN
4 12.02.2016 71f464119a2d43834c3be0af362487bc
5 12.02.2016 NaN
6 13.02.2016 NaN
7 13.02.2016 NaN
8 13.02.2016 NaN
9 14.02.2016 b2b5b93270cb217310b0719946b0afee
10 14.02.2016 NaN
11 14.02.2016 afad5e5f9c3ba9df885eba1557ab7c73
12 15.02.2016 NaN
13 15.02.2016 NaN
14 15.02.2016 NaN
15 16.02.2016 NaN
16 16.02.2016 NaN
17 16.02.2016 NaN
18 17.02.2016 NaN
19 17.02.2016 NaN
20 17.02.2016 NaN
In [149]: df5
Out[149]:
date
11.02.2016 [afad5e5f9c3ba9df885eba1557ab7c73]
12.02.2016 [71f464119a2d43834c3be0af362487bc]
13.02.2016 []
14.02.2016 [b2b5b93270cb217310b0719946b0afee, afad5e5f9c3...
15.02.2016 []
16.02.2016 []
17.02.2016 []
Upvotes: 0