Reputation: 5126
I have a dataframe as
df
indx pids
A 181718,
B 31718,
C 1718,
D 1235,3456
E 890654,
I want to return a row that matches 1718 exactly.
I tried to do this but as expected it returns rows where the 1718 is subset as well:
group_df = df.loc[df['pids'].astype(str).str.contains('{},'.format(1718)), 'pids']
indx pids
A 181718,
B 31718,
C 1718,
When I try to do something like this, it returns empty:
cham_geom = df.loc[df['pids'] == '1718', 'pids']
Expected output:
indx pids
C 1718,
Can anyone help me with it?
Upvotes: 5
Views: 2460
Reputation: 408
there is a method isin
that matches and returns a dataframe containing True for matched and false for not matching.
Consider the following example
>>> found = df.isin(["1718,"])==True
>>> df[found].head(3)
this will show the first 3 values matched with 1718
or if you want to match it with only 1 value then you can do so
>>> found = df.pids == "1718,"
>>> df[found].head(3)
Upvotes: 2
Reputation: 59549
Use str.contains
with a negative lookbehind, to ensure there are no other digits before '1718'
import pandas as pd
d = {'indx': {0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'E', 5: 'F', 6: 'G'},
'pids': {0: '181718,', 1: '31718,', 2: '1718,', 3: '1235,3456', 4: '890654,', 5: '1231,1718', 6: '1231, 1718'}}
df = pd.DataFrame(d)
df.loc[df.pids.str.contains('(?<![0-9])1718')]
indx pids
2 C 1718,
5 F 1231,1718
6 G 1231, 1718
Upvotes: 1
Reputation: 75080
you can try with:
df[df.pids.replace('\D','',regex=True).eq('1718')]
indx pids
2 C 1718,
'\D' : Any character that is not a numeric digit from 0 to 9.
EDIT Considering the below df:
indx pids
0 A 181718,
1 B 31718,
2 C 1718,
3 D 1235,3456
4 E 890654,
5 F 3220,1718
executing:
df[df.pids.str.split(",").apply(lambda x: '1718' in x)]
#if not comma only:-> df[df.pids.str.split("\D").apply(lambda x: '1718' in x)]
Gives:
indx pids
2 C 1718,
5 F 3220,1718
Upvotes: 5