Reputation: 93
I'm trying to pull wind direction out of a Metar string with format:
EGAA 010020Z 33004KT 300V010 9999 FEW029 04/04 Q1019
I'm using this to extract the wind direction which works for most of my data but fails on some strings:
df["Wind_Dir"] = df.metar.apply(lambda x: int(re.search(r"\s\d*KT\s", metar_data.metar[0]).group().strip()[:3]))
I'd like to inspect the Metar strings that it's failing on so instead of pulling group()
out of the re.search
I just applied the search as follows to get the re.Match
object:
df["Wind_Dir"] = df.metar.apply(lambda x: re.search(r"\s\d*KT\s", x))
I've tried filtering by type and by Null but neither of those work.
Any help would be appreciated.
Thanks for your answers unfortunately I can't mark them both as solutions despite using both to solve my problem.
In the end I changed my regex to:
df["Wind_Dir"] = df.metar.str.findall(r"Z\s\d\d\d|Z\sVRB")
to match for variable direction but wouldn't have been able to find that without df.metar.str.contains()
.
Upvotes: 3
Views: 2506
Reputation: 796
You are searching for this:
pandas.Series.str.contains returns a mask with True for indexes that match the pattern based on re.search
.
As Pandas documentation states, if you want a mask based on re.match
you should use: pandas.Series.str.match.
You can also use the following:
pandas.Series.str.extract which extracts the first match of the pattern on every rows of the Series on which you perform the analysis. NaN
will fill the rows that didn't contain the pattern so you can fetch for Nan
values to retrieve such rows.
Upvotes: 4
Reputation: 12054
You need your code to return matched string and not an re object.
This will also not work when there is no match since re.search won't return anything.
In your case try this
df['Wind_Dir'] = df.metar.str.findall(r"\s\d*KT\s")
df["Wind_Dir"] = df['Wind_Dir'].apply(lambda x: x[0].strip()[:3])
You also might want to check whether there is a match or not before executing 2nd statement.
Upvotes: 1