TheChymera
TheChymera

Reputation: 17924

make 2d array from pandas dataframe

I have a pandas dataframe over here with two columns: participant names and reaction times (note that one participant has more measures oh his RT).

    ID RT
0  foo  1
1  foo  2
2  bar  3
3  bar  4
4  foo  1
5  foo  2
6  bar  3
7  bar  4
8  bar  4

I would like to get a 2d array from this where every row contains the reaction times for one participant.

[[1,2,1,2]
[3,4,3,4,4]]

In case it's not possible to have a shape like that, the following options for obtaining a good a x b shape would be acceptable for me: fill missing elements with NaN; truncate the longer rows to the size of the shorter rows; fill the shorter rows with repeats of their mean value.

I would go for whatever is easiest to implement.

I have tried to sort this out by using groupby, and I expected it to be very easy to do this but it all gets terribly terribly messy :(

Upvotes: 0

Views: 1399

Answers (1)

HYRY
HYRY

Reputation: 97261

import pandas as pd
import io
data = io.BytesIO("""    ID RT
0  foo  1
1  foo  2
2  bar  3
3  bar  4
4  foo  1
5  foo  2
6  bar  3
7  bar  4
8  bar  4""")

df = pd.read_csv(data, delim_whitespace=True)
df.groupby("ID").RT.apply(pd.Series.reset_index, drop=True).unstack()

output:

    0  1  2  3   4
ID                 
bar  3  4  3  4   4
foo  1  2  1  2 NaN

Upvotes: 4

Related Questions