Reputation: 967
I have a dataframe that's read from a csv.
time node txrx src dest txid hops
0 34355146 2 TX 2 1 1 NaN
1 34373907 1 RX 2 1 1 1.0
2 44284813 2 TX 2 1 2 NaN
3 44302557 1 RX 2 1 2 1.0
4 44596500 3 TX 3 1 2 NaN
5 44630682 1 RX 3 1 2 2.0
6 50058251 2 TX 2 1 3 NaN
7 50075994 1 RX 2 1 3 1.0
8 51338658 3 TX 3 1 3 NaN
9 51382629 1 RX 3 1 3 2.0
I need to be able to create a new dataframe which takes the values in the TX/RX rows, to create a single row for each pair:
The df should then look like this:
tx_time rx_time src dest txid hops
0 34355146 34373907 2 1 1 1
1 44284813 44302557 2 1 2 1
2 44596500 44630682 3 1 2 2
3 50058251 50075994 2 1 3 1
4 51338658 51382629 3 1 3 2
I understand how to do step (3), however I'm a bit stuck on how to attempt (1) and (2). Advice?
Upvotes: 2
Views: 70
Reputation: 294278
A defaultdict
approach
This might actually be quicker for OP's purposes.
If speed is important, please check. YMMV.
from collections import defaultdict
d = defaultdict(lambda: defaultdict(dict))
cols = 'tx_time rx_time src dest txid hops'.split()
for t in df.itertuples():
i = (t.src, t.dest, t.txid)
d[t.txrx.lower() + '_time'][i] = t.time
if pd.notnull(t.hops):
d['hops'][i] = int(t.hops)
pd.DataFrame(d).rename_axis(['src', 'dest', 'txid']) \
.reset_index().reindex_axis(cols, 1)
tx_time rx_time src dest txid hops
0 34355146 34373907 2 1 1 1
1 44284813 44302557 2 1 2 1
2 50058251 50075994 2 1 3 1
3 44596500 44630682 3 1 2 2
4 51338658 51382629 3 1 3 2
Upvotes: 3
Reputation: 38415
Using concat though I think @Wen's solution using pivot would be more efficient
df_tx = df[::2].reset_index().drop(['index', 'txrx', 'node'], axis = 1).rename(columns = {'time': 'tx_time'})
df_rx = df[1::2].reset_index().drop(['index', 'txrx', 'node'], axis = 1).rename(columns = {'time': 'rx_time'})
pd.concat([df_tx, df_rx ], axis = 1).T.drop_duplicates().T.dropna(1)
You get
tx_time src dest txid rx_time hops
0 34355146.0 2.0 1.0 1.0 34373907.0 1.0
1 44284813.0 2.0 1.0 2.0 44302557.0 1.0
2 44596500.0 3.0 1.0 2.0 44630682.0 2.0
3 50058251.0 2.0 1.0 3.0 50075994.0 1.0
4 51338658.0 3.0 1.0 3.0 51382629.0 2.0
Upvotes: 2
Reputation: 323266
By using pivot_table
df.bfill().pivot_table(index=['src','dest','txid','hops'],columns=['txrx'],values='time').reset_index()
Out[766]:
txrx src dest txid hops RX TX
0 2 1 1 1.0 34373907 34355146
1 2 1 2 1.0 44302557 44284813
2 2 1 3 1.0 50075994 50058251
3 3 1 2 2.0 44630682 44596500
4 3 1 3 2.0 51382629 51338658
Or using unstack
df.bfill().set_index(['src','dest','txid','hops','txrx']).time.unstack(-1).reset_index()
Out[768]:
txrx src dest txid hops RX TX
0 2 1 1 1.0 34373907 34355146
1 2 1 2 1.0 44302557 44284813
2 2 1 3 1.0 50075994 50058251
3 3 1 2 2.0 44630682 44596500
4 3 1 3 2.0 51382629 51338658
PS:rename by using .rename(columns={})
I did not add here , cause will making the code too long ...
Upvotes: 3