Reputation: 1919
I have a pandas DataFrame df
that contains a list of filename.
Here is an example :
print(df)
>>
+---------+---------+
| ID| Field|
+---------+---------+
| AAA.png| X|
| BBB.jpg| Y|
| CCC.png| Z|
+---------+---------+
From a given ID
, which is the filename without the extension, I want to retrieve the value of the column Field
.
For example, for my_id = BBB
, I want to get the value Y
.
To so, I tried the following thing :
my_id = BBB
field_value = df[df["ID"].str.split('.')[0] == my_id]["Field"]
But I get the error KeyError: False
. I understand why I have this error but I don't know how I can do that in an other way.
Upvotes: 3
Views: 1252
Reputation: 862771
First filter by boolean indexing
with DataFrame.loc
- output is Series
:
field_value = df.loc[df["ID"].str.split('.').str[0] == my_id, "Field"]
And then for first value use next
with iter
:
first val = next(iter(field_value), 'no match')
If need all matched values in list:
L = field_value.tolist()
Upvotes: 3
Reputation: 46351
I tested with str.contains
:
my_id="BBB"
field_values = df.loc[df["ID"].str.contains(my_id), "Field"]
print(field_values)
It can return multiple values as you can see. Also it is bullet prof for file names starting with .
, like .AAA.png
.
ID Field
0 AAA.png X
1 BBB.jpg Y
2 CCC.png Z
3 BBB.png K
1 Y
3 K
Name: Field, dtype: object
Upvotes: 1
Reputation: 82765
Using os.path.splitext
Ex:
import os
import pandas as pd
df = pd.DataFrame({"ID": ["AAA.png", "BBB.png", "CCC.png"],
"Field": ["X", "Y", "Z"]})
my_id = "BBB"
mask = df["ID"].apply(os.path.splitext).str[0] == my_id
print(df[mask]["Field"])
Output:
1 Y
Name: Field, dtype: object
Upvotes: 0