I have a dataframe that's read from a csv. time node txrx src dest txid hops 0 34355146 2 TX 2 1 1 NaN 1 34373907 1 RX 2 1 1 1.0 2 44284813 2 TX 2 1 2 NaN 3 44302557 1 RX 2 1 2 1.0 4 44596500 3 TX 3 1 2 NaN 5 44630682 1 RX 3 1 2 2.0 6 50058251 2 TX 2 1 3 NaN 7 50075994 1 RX 2 1 3 1.0 8 51338658 3 TX 3 1 3 NaN 9 51382629 1 RX 3 1 3 2.0 I need to be able to create a new dataframe which takes the values in the TX/RX rows, to create a single row for each pair: Take the time from the 'time' column. If the value in 'txrx' is "TX", then put it into a 'tx_time' col, if the value is "RX" then put the value into the 'rx_time' col (within the row of the new dataframe). The value of 'hops' gets taken from the RX row. This is done for each ['src', 'dest', 'txid'] group. The 'node' column is ignored. The df should then look like this: tx_time rx_time src dest txid hops 0 34355146 34373907 2 1 1 1 1 44284813 44302557 2 1 2 1 2 44596500 44630682 3 1 2 2 3 50058251 50075994 2 1 3 1 4 51338658 51382629 3 1 3 2 I understand how to do step (3), however I'm a bit stuck on how to attempt (1) and (2). Advice?

Reputation: 967

Combining conditional row data into new dataframe

I have a dataframe that's read from a csv.

          time  node txrx  src  dest  txid  hops
0     34355146     2   TX    2     1     1   NaN
1     34373907     1   RX    2     1     1   1.0
2     44284813     2   TX    2     1     2   NaN
3     44302557     1   RX    2     1     2   1.0
4     44596500     3   TX    3     1     2   NaN
5     44630682     1   RX    3     1     2   2.0
6     50058251     2   TX    2     1     3   NaN
7     50075994     1   RX    2     1     3   1.0
8     51338658     3   TX    3     1     3   NaN
9     51382629     1   RX    3     1     3   2.0

I need to be able to create a new dataframe which takes the values in the TX/RX rows, to create a single row for each pair:

Take the time from the 'time' column. If the value in 'txrx' is "TX", then put it into a 'tx_time' col, if the value is "RX" then put the value into the 'rx_time' col (within the row of the new dataframe).
The value of 'hops' gets taken from the RX row.
This is done for each ['src', 'dest', 'txid'] group.
The 'node' column is ignored.

The df should then look like this:

      tx_time  rx_time  src  dest  txid  hops
0    34355146 34373907    2     1     1     1
1    44284813 44302557    2     1     2     1
2    44596500 44630682    3     1     2     2
3    50058251 50075994    2     1     3     1
4    51338658 51382629    3     1     3     2

I understand how to do step (3), however I'm a bit stuck on how to attempt (1) and (2). Advice?

Upvotes: 2

Answers (3)

piRSquared

Reputation: 294278

A defaultdict approach
This might actually be quicker for OP's purposes.
If speed is important, please check. YMMV.

from collections import defaultdict

d = defaultdict(lambda: defaultdict(dict))
cols = 'tx_time  rx_time  src  dest  txid  hops'.split()

for t in df.itertuples():
    i = (t.src, t.dest, t.txid)
    d[t.txrx.lower() + '_time'][i] = t.time
    if pd.notnull(t.hops):
        d['hops'][i] = int(t.hops)

pd.DataFrame(d).rename_axis(['src', 'dest', 'txid']) \
  .reset_index().reindex_axis(cols, 1)

    tx_time   rx_time  src  dest  txid  hops
0  34355146  34373907    2     1     1     1
1  44284813  44302557    2     1     2     1
2  50058251  50075994    2     1     3     1
3  44596500  44630682    3     1     2     2
4  51338658  51382629    3     1     3     2

Upvotes: 3

Vaishali

Reputation: 38415

Using concat though I think @Wen's solution using pivot would be more efficient

df_tx = df[::2].reset_index().drop(['index', 'txrx', 'node'], axis = 1).rename(columns = {'time': 'tx_time'})
df_rx = df[1::2].reset_index().drop(['index', 'txrx', 'node'], axis = 1).rename(columns = {'time': 'rx_time'})

pd.concat([df_tx, df_rx ], axis = 1).T.drop_duplicates().T.dropna(1)

You get

    tx_time     src dest    txid    rx_time     hops
0   34355146.0  2.0 1.0     1.0     34373907.0  1.0
1   44284813.0  2.0 1.0     2.0     44302557.0  1.0
2   44596500.0  3.0 1.0     2.0     44630682.0  2.0
3   50058251.0  2.0 1.0     3.0     50075994.0  1.0
4   51338658.0  3.0 1.0     3.0     51382629.0  2.0

Upvotes: 2

BENY

Reputation: 323266

By using pivot_table

df.bfill().pivot_table(index=['src','dest','txid','hops'],columns=['txrx'],values='time').reset_index()
Out[766]: 
txrx  src  dest  txid  hops        RX        TX
0       2     1     1   1.0  34373907  34355146
1       2     1     2   1.0  44302557  44284813
2       2     1     3   1.0  50075994  50058251
3       3     1     2   2.0  44630682  44596500
4       3     1     3   2.0  51382629  51338658

Or using unstack

df.bfill().set_index(['src','dest','txid','hops','txrx']).time.unstack(-1).reset_index()
Out[768]: 
txrx  src  dest  txid  hops        RX        TX
0       2     1     1   1.0  34373907  34355146
1       2     1     2   1.0  44302557  44284813
2       2     1     3   1.0  50075994  50058251
3       3     1     2   2.0  44630682  44596500
4       3     1     3   2.0  51382629  51338658

PS:rename by using .rename(columns={}) I did not add here , cause will making the code too long ...

Upvotes: 3

Combining conditional row data into new dataframe

Answers (3)

Related Questions