Collective Action
Collective Action

Reputation: 8009

pandas: return column values that begin with certain number(s)

I have the following df:

url = 'https://raw.githubusercontent.com/108michael/ms_thesis/master/sic_naics_catcode.csv'
df= pd.read_csv(url, index_col=0)
df.head(3)

    SICcode     Catcode     Category    SICname                      MultSIC    2012 NAICS Code     2002to2007 NAICS
0   111         A1500   Wheat, corn, soybeans and cash grain    Wheat   X           111140           111140
1   112         A1600   Other commodities (incl rice, peanuts, honey)   X           111160           111160
2   115         A1500   Wheat, corn, soybeans and cash grain    Corn    X           111150           111150

I want to return all rows that begin with e.g., 531, or 92, or, in some cases, values that begin with 5416 through 5419 in column 2002to2007 NAICS.

I think that this must be quite easy. I'm familiar with (this is just a template) dz = df[(df['date'] > '01/03/2005') & (df['date'] < '01/03/2015')] type code, but I don't know about any 'wild-card' symbols that would allow me to enter a truncated value.

Any ideas?

Upvotes: 1

Views: 267

Answers (2)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210882

you can use the RegEx power for that:

df.loc[df['2002to2007 NAICS'].astype(str).str.contains(r'^(?:531|92|541[6-9])')]

will give you all values that begin with 531 or 92 or 5416-5419

Upvotes: 2

user2285236
user2285236

Reputation:

For values starting with 531 or 92:

df.loc[(df["2002to2007 NAICS"].astype(str).str.startswith("531")) | (df["2002to2007 NAICS"].astype(str).str.startswith("92"))]

For values starting with 5416:5419:

df.loc[df["2002to2007 NAICS"].astype(str).str.slice(0,4).isin([str(i) for i in range(5416, 5420)])]

Upvotes: 3

Related Questions