How to access individual tuple value stored in series?

Question

I have a dataframe that contains a tuple in each cell.

import pandas as pd
inp = [[(11,110), (12,120)], 
       [(13,130), (14,140), (15,150)]]
df = pd.DataFrame(inp)

for index, row in df.iterrows():
    print(row)

I wish to access each element in row iteration manner. As you can see, the iterrows() returns a series of tuples in row manner, but not the individual value of it. For example, it gives me (11, 110) ... (15, 150). I want to split them into a single integer.

The desired outcome should let me access to individual value of the tuple by index in row manner. For example, in the row iteration I am able to get 11, 12, 13, 14, 15 from index[0], while 110, 120, 130, 140, 150 from index[1]

Is that possible to do so within iterrows()?

Thank in advance!

PaSTE · Accepted Answer

First of all, only use DataFrame.iterrows() as a last resort. DataFrames are optimized for vectorized operations on entire columns at once, not for row-by-row operations. And if you must iterate, consider using DataFrame.itertuples() instead because it preserves the data type of each column and runs much, much faster.

Second, it is important in Pandas (and all of computing, really) to structure your data appropriately for the task at hand. Your current solution has persons along the index and time points as the columns. That makes for a wide, ragged matrix with potentially many NaNs, as your example shows. It sounds like you want to store four elements of data for each cell of your DataFrame: person, time, x, and y. Consider using four columns instead of one column per time point, like so:

import pandas as pd
inp = [[(11,110), (12,120)], 
       [(13,130), (14,140), (15,150)]]
df = pd.DataFrame(inp)  # ragged and wide--not ideal for Pandas

df2 = df.stack()  # now each element is indexed by a MultiIndex (person and time).
df2.index.rename(["person", "time"], inplace=True)  # to be explicit

df3 = pd.DataFrame(df2.tolist(), index=df2.index)  # now each row is a person/time and there are two columns for x and y
df3.reset_index(inplace=True)  # not strictly necessary
df3.rename(columns={0: "x", 1: "y"}, inplace=True)  # to be explicit

for row in df3.itertuples():  # using itertuples instead of iterrows
    print(row)
# Pandas(Index=0, person=0, time=0, x=11, y=110)
# Pandas(Index=1, person=0, time=1, x=12, y=120)
# Pandas(Index=2, person=1, time=0, x=13, y=130)
# Pandas(Index=3, person=1, time=1, x=14, y=140)
# Pandas(Index=4, person=1, time=2, x=15, y=150)

You should take a look at this answer for how I split the tuples. Of course, if you have the ability to control how the data are being constructed, you do not need to do this kind of manipulation--just create the DataFrame with the appropriate structure in the first place.

Now you can treat df3["x"] and df3["y"] as pandas.Series objects for whatever you need to do:

for x in df3["x"]:
    print(x)
# 11
# 12
# 13
# 14
# 15

for y in df3["y"]:
    print(y)
# 110
# 120
# 130
# 140
# 150

print(df3["x"] * df3["y"]/5 + 1)
# 0    243.0
# 1    289.0
# 2    339.0
# 3    393.0
# 4    451.0
# dtype: float64

How to access individual tuple value stored in series?

Answers (1)

Related Questions