ccc
ccc

Reputation: 21

In a Pandas dataframe, how to concat row data identified by groups w.r.t time?

I have a dataset where I put data and label it. I would like to group the data such that all the distinct groups defined by "RPI_ID" are synced in time and rows of "PanelStatus" are concatenated. For example,

                  ts_unix        RPI_ID   Panel       Status
2017-04-18 06:46:22-07:00  1.492523e+09      26            1
2017-04-18 06:46:22-07:00  1.492523e+09      27            0
2017-04-18 06:46:22-07:00  1.492523e+09      28            1
2017-04-18 06:46:22-07:00  1.492523e+09      29            0
2017-04-18 06:46:22-07:00  1.492523e+09      30            0

should turn into

ts_unix                    PanelStatus_JOINED
2017-04-18 06:46:22-07:00  '10100'

Any ideas?

Upvotes: 2

Views: 82

Answers (4)

Vaishali
Vaishali

Reputation: 38415

You can groupby and use sum()

df['Status'] = df['Status'].astype(str)
df.groupby('ts_unix').Status.sum().reset_index()\
.rename(columns = {'Status': 'PanelStatus_JOINED'})

You get

    ts_unix         PanelStatus_JOINED
0   06:46:22-07:00  10100

OR

df.groupby('ts_unix').Status.agg({'PanelStatus_JOINED':'sum'}).reset_index()

to get

    ts_unix         PanelStatus_JOINED
0   06:46:22-07:00  10100

Upvotes: 2

piRSquared
piRSquared

Reputation: 294218

use get_dummies on ts_unix then dot with Status

pd.get_dummies(df.ts_unix).T.dot(df.Status.astype(str))

2017-04-18 06:46:22-07:00    10100
dtype: object

To get a better appreciation for this, let's give a better example.

df = pd.DataFrame(dict(
        ts_unix=pd.date_range('2015-03-31', periods=4).repeat(4),
        Status=np.random.randint(10, size=16)
    ))

print(df)

    Status    ts_unix
0        6 2015-03-31
1        7 2015-03-31
2        6 2015-03-31
3        8 2015-03-31
4        6 2015-04-01
5        8 2015-04-01
6        3 2015-04-01
7        2 2015-04-01
8        9 2015-04-02
9        4 2015-04-02
10       7 2015-04-02
11       8 2015-04-02
12       0 2015-04-03
13       1 2015-04-03
14       9 2015-04-03
15       9 2015-04-03

Then:

pd.get_dummies(df.ts_unix).T.dot(df.Status.astype(str))

2015-03-31    6768
2015-04-01    6832
2015-04-02    9478
2015-04-03    0199
dtype: object

Upvotes: 1

zipa
zipa

Reputation: 27869

You can try this:

df['Status'] = df['Status'].astype(str)
result = df.groupby('ts_unix')['Status'].apply(lambda x: ''.join(x))

Upvotes: 0

Abigail
Abigail

Reputation: 76

It depends somewhat on how you want to sync things in time, but you can use a groupby apply to concatenate and a groupby nth(0) to get the first timestamp per group.

data = pd.DataFrame(data = {'ts_unix':['2017-04-18 06:46:22-07:00', '2017-04-18 06:46:22-07:00', '2017-04-18 06:46:22-07:00', '2017-04-18 06:46:22-07:00', '2017-04-18 06:46:22-07:00'], 'RPI_ID':[1492523000, 1492523000, 1492523000, 1492523000, 1492523000], 'Panel':range(26, 31), 'Status':[1,0,1,0,0]})
concat_status = data.groupby('RPI_ID')['Status'].apply(lambda x: int(''.join(map(str, x.values)))).to_frame().rename(columns={'Status':'PanelStatus_JOINED'})
start_time = data.groupby('RPI_ID')[['ts_unix']].nth(0)
rv = start_time.join(concat_status)

Upvotes: 0

Related Questions