Reputation: 195
In my dataframe, I have a column that will contain a date. It should only be accepted if the format is 'YYYYMMDD' or 'MMDD'. In addition, if the format is 'MMDD', the year should be taken from another column and appended into the MMDD date like so;
df['YYYYMMDD'] = df['YYYY'].astype(str) + df['Date'].astype(str).apply(lambda x: x.zfill(4))
After the append, the old column is deleted and the new column renamed to the desired output.
For the match, I've tried regex (df_ori['Date'].str.matches(r'^\d{8}$')
) but I get the error;
AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
.
I try df_ori['Date'].astype(str).str.matches(r'^\d{8}$')
and it gives the error;
'StringMethods' object has no attribute 'matches'
I think I'm just going about this the wrong way really. Any help is appreciated.
df.head() as requested:
YYYY MMDD
0 2016 525
1 2016 728
2 2014 821
3 2016 311
4 2016 422
Upvotes: 1
Views: 2266
Reputation: 863266
You need str.match
and str.zfill
:
df['YYYYMMDD'] = df['YYYY'].astype(str) + df['Date'].astype(str).str.zfill(4)
print (df_ori['YYYYMMDD'].astype(str).str.match(r'^\d{8}$'))
0 True
1 True
2 True
3 True
4 True
Name: YYYYMMDD, dtype: bool
If want 4
or 8
match:
print (df_ori['YYYYMMDD'].astype(str).str.match(r'^\d{8}$|^\d{4}$'))
If want from 4 to 8 match:
print (df_ori['YYYYMMDD'].astype(str).str.match(r'^\d{4,8}$'))
EDIT:
If need append only if len is 4
:
print (df_ori)
YYYY MMDD YYYYMMDD
0 2016 525 20160525
1 2016 728 20160728
2 2014 1121 1121
3 2016 1211 2211
4 2016 422 20160422
a = df_ori['YYYY'].astype(str) + df_ori['YYYYMMDD'].astype(str)
m = df_ori['YYYYMMDD'].astype(str).str.len() == 4
df_ori['YYYYMMDD'] = df_ori['YYYYMMDD'].mask(m, a)
print (df_ori)
YYYY MMDD YYYYMMDD
0 2016 525 20160525
1 2016 728 20160728
2 2014 1121 20141121
3 2016 1211 20162211
4 2016 422 20160422
Upvotes: 1