Reputation: 21
I have a dataset where I put data and label it. I would like to group the data such that all the distinct groups defined by "RPI_ID" are synced in time and rows of "PanelStatus" are concatenated. For example,
ts_unix RPI_ID Panel Status
2017-04-18 06:46:22-07:00 1.492523e+09 26 1
2017-04-18 06:46:22-07:00 1.492523e+09 27 0
2017-04-18 06:46:22-07:00 1.492523e+09 28 1
2017-04-18 06:46:22-07:00 1.492523e+09 29 0
2017-04-18 06:46:22-07:00 1.492523e+09 30 0
should turn into
ts_unix PanelStatus_JOINED
2017-04-18 06:46:22-07:00 '10100'
Any ideas?
Upvotes: 2
Views: 82
Reputation: 38415
You can groupby and use sum()
df['Status'] = df['Status'].astype(str)
df.groupby('ts_unix').Status.sum().reset_index()\
.rename(columns = {'Status': 'PanelStatus_JOINED'})
You get
ts_unix PanelStatus_JOINED
0 06:46:22-07:00 10100
OR
df.groupby('ts_unix').Status.agg({'PanelStatus_JOINED':'sum'}).reset_index()
to get
ts_unix PanelStatus_JOINED
0 06:46:22-07:00 10100
Upvotes: 2
Reputation: 294218
use get_dummies
on ts_unix
then dot
with Status
pd.get_dummies(df.ts_unix).T.dot(df.Status.astype(str))
2017-04-18 06:46:22-07:00 10100
dtype: object
To get a better appreciation for this, let's give a better example.
df = pd.DataFrame(dict(
ts_unix=pd.date_range('2015-03-31', periods=4).repeat(4),
Status=np.random.randint(10, size=16)
))
print(df)
Status ts_unix
0 6 2015-03-31
1 7 2015-03-31
2 6 2015-03-31
3 8 2015-03-31
4 6 2015-04-01
5 8 2015-04-01
6 3 2015-04-01
7 2 2015-04-01
8 9 2015-04-02
9 4 2015-04-02
10 7 2015-04-02
11 8 2015-04-02
12 0 2015-04-03
13 1 2015-04-03
14 9 2015-04-03
15 9 2015-04-03
Then:
pd.get_dummies(df.ts_unix).T.dot(df.Status.astype(str))
2015-03-31 6768
2015-04-01 6832
2015-04-02 9478
2015-04-03 0199
dtype: object
Upvotes: 1
Reputation: 27869
You can try this:
df['Status'] = df['Status'].astype(str)
result = df.groupby('ts_unix')['Status'].apply(lambda x: ''.join(x))
Upvotes: 0
Reputation: 76
It depends somewhat on how you want to sync things in time, but you can use a groupby apply to concatenate and a groupby nth(0) to get the first timestamp per group.
data = pd.DataFrame(data = {'ts_unix':['2017-04-18 06:46:22-07:00', '2017-04-18 06:46:22-07:00', '2017-04-18 06:46:22-07:00', '2017-04-18 06:46:22-07:00', '2017-04-18 06:46:22-07:00'], 'RPI_ID':[1492523000, 1492523000, 1492523000, 1492523000, 1492523000], 'Panel':range(26, 31), 'Status':[1,0,1,0,0]})
concat_status = data.groupby('RPI_ID')['Status'].apply(lambda x: int(''.join(map(str, x.values)))).to_frame().rename(columns={'Status':'PanelStatus_JOINED'})
start_time = data.groupby('RPI_ID')[['ts_unix']].nth(0)
rv = start_time.join(concat_status)
Upvotes: 0