salhin
salhin

Reputation: 2652

Iterate over pairwise combinations of column names and row indices in pandas

If I have the following pandas DataFrame :

>>> df

  x y z

x 1 3 0

y 0 5 0

z 0 3 4

I want to iterate over the pairwise combinations of column names and row indices to perform certain operation. For example, for the pair of x and y, replace the 3 with 'xy'. The desired output will look like:

>>> df

   x  y z

x xx xy xz

y xy yy yz

z xz yz zz

a naïve code that I tried and doesn't work is:

for i, j in range(0,2):
    df.loc[df.index[i], df.columns[j]] = df.index[i] + df.columns[j]

Upvotes: 2

Views: 4672

Answers (5)

Scott Boston
Scott Boston

Reputation: 153510

How about a simple one-liner, using Pandas DataFrame elements:

df.apply(lambda x: x.index+x.name)

Output:

    x   y   z
x  xx  xy  xz
y  yx  yy  yz
z  zx  zy  zz

Update: Using numpy.ufunc.outer method.

pd.DataFrame(np.add.outer(df.index, df.columns), index=df.index, columns=df.columns)

Output:

    x   y   z
x  xx  xy  xz
y  yx  yy  yz
z  zx  zy  zz

Upvotes: 10

Vikash Singh
Vikash Singh

Reputation: 14011

df.set_value() is way faster, link to why: Set value for particular cell in pandas DataFrame

import pandas as pd

data = [{'x': 1, 'y': 2, 'z': 3}, {'x': 4, 'y': 5, 'z': 6}, {'x': 7, 'y': 8, 'z': 9}]

df = pd.DataFrame.from_dict(data, orient='columns')

df = df.astype(str)

df

#       x   y   z
#    0  1   2   3
#    1  4   5   6
#    2  7   8   9


for idx, row in df.iterrows():
    for column in list(df.columns.values):
        val = str(idx) + str(column)
        df.set_value(idx, column, val)

df

output:

    x   y   z
0   0x  0y  0z
1   1x  1y  1z
2   2x  2y  2z

Note: set_value won't work if column names are not unique https://github.com/cm3/lafayettedb_thumbnail_getter/issues/3 . You will have to separately fix the non_unique column name problem.

If you don't care about column names you can prepone it with column #

df.columns = [str(idx) + '_' + name for idx, name in enumerate(df.columns)]

Upvotes: 2

javidcf
javidcf

Reputation: 59731

This should be really fast:

import numpy as np

grid = np.meshgrid(df.columns.values.astype(str),
                   df.index.values.astype(str))
result = np.core.defchararray.add(*grid)

You can then assign result to either the same dataframe or another one.

Upvotes: 1

Cory Madden
Cory Madden

Reputation: 5193

for i, col in enumerate(df.columns):
    print(df[i][col] + df[col][i])


df = pd.DataFrame(df[i][col] + df[col][i] for i, col in enumerate(df.columns))

This way you can iterate over all the columns and paired rows dynamically without needing to know how many columns there are.

Upvotes: 0

Constructor
Constructor

Reputation: 534

Is this what you are looking for?

>>> df
   x  y  z
x  1  3  0
y  0  5  0
z  0  3  4

>>> for i in range(3):
...     for j in range(3):
...         df.loc[df.index[i], df.columns[j]] = df.index[i] + df.columns[j]
...
>>> df
    x   y   z
x  xx  xy  xz
y  yx  yy  yz
z  zx  zy  zz

Upvotes: 0

Related Questions