Reputation: 1
I have a df with strings and dates in different columns. One column holds a string and the one next to it an associated date. I'm trying to search for a specific string within the first column and if found, return the value of the second column. The problem is there's 15 of these columns in a df with 85 columns. Can this be accomplished?
Thanks!
so far I have:
df['New'] = df.apply(lambda row: row.astype(str).str.contains('Delayed').any(), axis=1)
to search within the string
Upvotes: 0
Views: 49
Reputation: 1680
Here is what an MCVE looks like. It includes the data as roughly described in the question. This is a busybox that can be used to see what goes on when the lambda is applied, and shows how to make things work.
import pandas as pd
from pandas.compat import StringIO
print(pd.__version__)
csvdata = StringIO("""date,LASTA,LASTB,LASTC
1999-03-15,2.5597,8.20145,16.900
1999-03-31,delayed,7.73057,16.955
1999-04-01,2.8321,7.63714,17.500
1999-04-06,2.8537,delayed,delayed""")
df = pd.read_csv(csvdata)
# debugging complex lambdas is sometimes better done by
# passing in a function to see what is going on
def row(x):
print(type(x))
match = x.str.contains('delayed').any()
return match
df['function_match'] = df.apply(row, axis=1)
df['lambda_match'] = df.apply(lambda row: row.str.contains('delayed').any(), axis=1)
# use the match column as a boolean mask, and then index by preferred column
print(df[df['lambda_match']]['LASTA'])
This produces
0.20.3
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
1 delayed
3 2.8537
Name: LASTA, dtype: object
Upvotes: 1