Pandas create a dataframe whose entries are relationships between rows of another dataframe?

Question

Suppose I have the following table in a csv.

 elementID | groupID | sequence
     abc   |    A    |    0
     dcv   |    A    |    1
     asd   |    B    |    1
     ccc   |    B    |    0
     abc   |    B    |    2

I imported this table in Pandas as a DataFrame.

From this DataFrame I need to create another one where each entry represents two two sequential elements of elementID.

For example something like the following

 elementID_1 | elementID_2 | groupID
     abc     |    dcv      |    A
     asd     |    abc      |    B
     ccc     |    asd      |    B

At first I thought of a multi-row selection but could not find out how to do that.

So I decided to do step by step creating one column per time. I created first the column elementID_2 as follows

new_df = pd.DataFrame()
new_df[["elementID_2", "sequence", "groupID"]] = old_df.loc[old_df.loc.sequence>1][["elementID", "sequence", "groupID"]]

Now to create the elementID_1 column I would like to match the entries of the old dataframe having assequence value, the sequence value of the new DataFrame minus 1 and the same groupdID between the two DataFrame.

How can I do that?

Quang Hoang · Accepted Answer

IIUC, you can do so by shift() on groups and dropna:

df.sort_values('sequence', inplace=True)
df['elementID_1'] = df.groupby('groupID').elementID.shift()
df.dropna()

Output:

+----+-------------+-----------+------------+---------------+
|    | elementID   | groupID   |   sequence | elementID_1   |
|----+-------------+-----------+------------+---------------|
|  1 | dcv         | A         |          1 | abc           |
|  2 | asd         | B         |          3 | ccc           |
|  4 | abc         | B         |          4 | asd           |
+----+-------------+-----------+------------+---------------+

Then you can drop sequence column and rename elementID to elementID_2.

Pandas create a dataframe whose entries are relationships between rows of another dataframe?

Answers (2)

Related Questions