Indexing into array columns of a pandas DataFrame

Question

I have a pandas DataFrame that contains some array columns. What is the recommended way to index some of these columns by different position indices? For example, from the array column named l I need the second elements, from the array column named a I need the first elements. The results should be a new DataFrame. Array column can either contain a Python list or a Numpy array, but this probably does not matter.

I have three solutions, but I don't really like any of them.

df= pd.DataFrame({'l': [[1, 2, 4], [3, 2, 0, 10]], \
                  'a':[np.array(["foo", "bar", "baz"]), np.array(["qux", "quux"])], \
                  'dontcare': [10, 20]})

               l                a  dontcare
0      [1, 2, 4]  [foo, bar, baz]        10
1  [3, 2, 0, 10]      [qux, quux]        20

Solution 1, with str and join

df['l'].str[1].to_frame('l').join(df['a'].str[0])

   l    a
0  2  foo
1  2  qux

Solution 2, with the function apply and creating Series

df.apply(lambda row: pd.Series([row['l'][1], row['a'][0]], index=['l', 'a']), axis=1)

Solution 3, with apply and broadcast.

df[['l', 'a']].apply(lambda row: [row['l'][1], row['a'][0]], axis=1, result_type='broadcast')

We can assume that the output column names match the input column names and we dont need multiple elements of any array column.

Indexing into array columns of a pandas DataFrame

Answers (1)

Related Questions