Bilbo Swaggins
Bilbo Swaggins

Reputation: 93

How to find rows in Pandas dataframe where re.search fails

I'm trying to pull wind direction out of a Metar string with format:

EGAA 010020Z 33004KT 300V010 9999 FEW029 04/04 Q1019

I'm using this to extract the wind direction which works for most of my data but fails on some strings:

df["Wind_Dir"] = df.metar.apply(lambda x: int(re.search(r"\s\d*KT\s", metar_data.metar[0]).group().strip()[:3]))

I'd like to inspect the Metar strings that it's failing on so instead of pulling group() out of the re.search I just applied the search as follows to get the re.Match object:

df["Wind_Dir"] = df.metar.apply(lambda x: re.search(r"\s\d*KT\s", x))

I've tried filtering by type and by Null but neither of those work.

Any help would be appreciated.


Thanks for your answers unfortunately I can't mark them both as solutions despite using both to solve my problem.

In the end I changed my regex to:

df["Wind_Dir"] = df.metar.str.findall(r"Z\s\d\d\d|Z\sVRB")

to match for variable direction but wouldn't have been able to find that without df.metar.str.contains().

Upvotes: 3

Views: 2506

Answers (2)

SmileyProd
SmileyProd

Reputation: 796

You are searching for this: pandas.Series.str.contains returns a mask with True for indexes that match the pattern based on re.search.

As Pandas documentation states, if you want a mask based on re.match you should use: pandas.Series.str.match.

You can also use the following: pandas.Series.str.extract which extracts the first match of the pattern on every rows of the Series on which you perform the analysis. NaN will fill the rows that didn't contain the pattern so you can fetch for Nan values to retrieve such rows.

Upvotes: 4

Hima
Hima

Reputation: 12054

You need your code to return matched string and not an re object.

This will also not work when there is no match since re.search won't return anything.

Try pandas.series.str.findall

In your case try this

df['Wind_Dir'] = df.metar.str.findall(r"\s\d*KT\s")
df["Wind_Dir"] = df['Wind_Dir'].apply(lambda x: x[0].strip()[:3])

You also might want to check whether there is a match or not before executing 2nd statement.

Upvotes: 1

Related Questions