Reputation: 3362
Let's say I have a dataframe that looks like this:
df2 = pd.DataFrame(['Apple, 10/01/2016, 31/10/18, david/kate', 'orange', 'pear', 'Apple', '10/01/2016', '02/20/2017'], columns=['A'])
>>> df2
A file_name
0 Apple, 10/01/2016, 31/10/18, david/kate a.txt
1 orange a.txt
2 pear b.txt
3 Apple a.txt
4 10/01/2016 d.txt
5 02/20/2017 e.txt
What I would like is to just extract the dates in this dataframe, so output would be like this:
A file_name
0 10/01/2016, 31/10/18 a.txt
1 Nothing to return a.txt
2 Nothing to return b.txt
3 Nothing to return a.txt
4 10/01/2016 d.txt
5 02/20/2017 e.txt
Does anyone have any suggestions on how to do this? I am not sure where to begin.
Edit #1:
I edited my original dataframe and output results to better reflect what I am looking for.
Upvotes: 1
Views: 623
Reputation: 8273
import datetime
import re
def my_func(row):
temp=''
for d in row.split(","):
match=re.match('(\d*/\d*/\d*)',d.strip())
if match:
temp =temp + match.group(0)+','
if(temp):
return temp[:-1]
return "Nothing to return"
df2.A=df2.A.apply(lambda x : my_func(x))
Output:
A file_name
0 10/01/2016, 31/10/18 a.txt
1 Nothing to return a.txt
2 Nothing to return b.txt
3 Nothing to return a.txt
4 10/01/2016 d.txt
5 02/20/2017 e.txt
Upvotes: 1
Reputation: 323316
Using extractall
add reindex(df2.index).fillna('Nothing to return')
df2.A.str.extractall(r'(((?:\d+[/-])?\d+[/-]\d+))')[0].groupby(level=0).apply(','.join)
Out[459]:
0 10/01/2016,31/10/18
4 10/01/2016
5 02/20/2017
Name: 0, dtype: object
Update
df2.A.str.extractall(r'(((?:\d+[/-])?\d+[/-]\d+))')[0].groupby(level=0).apply(','.join).reindex(df2.index).fillna('Nothing to return')
Out[463]:
0 10/01/2016,31/10/18
1 Nothing to return
2 Nothing to return
3 Nothing to return
4 10/01/2016
5 02/20/2017
Name: 0, dtype: object
Upvotes: 1
Reputation: 8132
Doesn't exactly match your desired output but this structure is probably better and can be easily converted into what you want.
Basically this is a job for regex. This code should find anything of the form number/number/number:
s = df2["A"]
result = s.str.extractall(r"(\d+/\d+/\d+)")[0]
print(result)
>>> match
0 0 10/01/2016
1 31/10/18
4 0 10/01/2016
5 0 02/20/2017
Upvotes: 2