John Cres
John Cres

Reputation: 33

Searching within lists in a pandas dataframe

I am trying to figure out the most efficient way to search within lists in a Dataframe in python. For example I have the following row in a pandas Dataframe

PLACES_DETECTED                                              PLACE_VALUES
[A14, A09, A08, A15, A03]               [-0.0369, -0.0065, 0.01, -0.0295, -0.0402]

[B10, B19, A18, A03, A14]               [-0.0641, 0.0419, -0.0196, 0.0747, -0.0]

I want to search for example for A14 and get the place value for it, but I don't know if there is a better way to do it other then just doing a for loop with many if statements?

Thanks for your help!

I tried using df.loc[df['PLACES_DETECTED'] == 'A14'] which didn't work but even if it had it doesn't help me get the place value associated with it since that is in another column in the dataframe.

Upvotes: 2

Views: 58

Answers (3)

SomeDude
SomeDude

Reputation: 14228

If you convert the column values to numpy arrays, its easier to locate.

detected = np.array(df['PLACES_DETECTED'].tolist())
values = np.array(df['PLACE_VALUES'].tolist())

Now its just

values[detected == 'A14']

output:

array([-0.0369, -0.    ])

Upvotes: 1

Mikkel
Mikkel

Reputation: 318

You could also do the following:

def get_place_value(row, place):
    places = row['PLACES_DETECTED']
    values = row['PLACE_VALUES']
    if place in places:
        index = places.index(place)
        return values[index]
    return None

place_values = df.apply(lambda row: get_place_value(row, 'A14'), axis=1)

That would also get you the following output:

0   -0.0369
1   -0.0000

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195428

I values in list in PLACES_DETECTED column are unique (no duplicities), you can create a temporary Series with a dict and then use .str method:

tmp = df.apply(
    lambda x: dict(zip(x["PLACES_DETECTED"], x["PLACE_VALUES"])), axis=1
)

df["A14"] = tmp.str["A14"]
print(df)

Prints:

             PLACES_DETECTED                                PLACE_VALUES     A14
0  [A14, A09, A08, A15, A03]  [-0.0369, -0.0065, 0.01, -0.0295, -0.0402] -0.0369
1  [B10, B19, A18, A03, A14]    [-0.0641, 0.0419, -0.0196, 0.0747, -0.0] -0.0000

OR: Use .explode:

df = df.explode(["PLACES_DETECTED", "PLACE_VALUES"])
print(df[df.PLACES_DETECTED == "A14"])

Prints:

  PLACES_DETECTED PLACE_VALUES
0             A14      -0.0369
1             A14         -0.0

Upvotes: 2

Related Questions