CDickson
CDickson

Reputation: 195

Pandas - check if values in column match one of two formats

In my dataframe, I have a column that will contain a date. It should only be accepted if the format is 'YYYYMMDD' or 'MMDD'. In addition, if the format is 'MMDD', the year should be taken from another column and appended into the MMDD date like so;

  df['YYYYMMDD'] = df['YYYY'].astype(str) + df['Date'].astype(str).apply(lambda x: x.zfill(4))

After the append, the old column is deleted and the new column renamed to the desired output.

For the match, I've tried regex (df_ori['Date'].str.matches(r'^\d{8}$')) but I get the error;

AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas.

I try df_ori['Date'].astype(str).str.matches(r'^\d{8}$') and it gives the error;

'StringMethods' object has no attribute 'matches'

I think I'm just going about this the wrong way really. Any help is appreciated.

df.head() as requested:

   YYYY  MMDD
0  2016   525
1  2016   728
2  2014   821
3  2016   311
4  2016   422 

Upvotes: 1

Views: 2266

Answers (1)

jezrael
jezrael

Reputation: 863266

You need str.match and str.zfill:

df['YYYYMMDD'] = df['YYYY'].astype(str) + df['Date'].astype(str).str.zfill(4)

print (df_ori['YYYYMMDD'].astype(str).str.match(r'^\d{8}$'))
0    True
1    True
2    True
3    True
4    True
Name: YYYYMMDD, dtype: bool

If want 4 or 8 match:

print (df_ori['YYYYMMDD'].astype(str).str.match(r'^\d{8}$|^\d{4}$'))

If want from 4 to 8 match:

print (df_ori['YYYYMMDD'].astype(str).str.match(r'^\d{4,8}$'))

EDIT:

If need append only if len is 4:

print (df_ori)
   YYYY  MMDD  YYYYMMDD
0  2016   525  20160525
1  2016   728  20160728
2  2014  1121      1121
3  2016  1211      2211
4  2016   422  20160422

a = df_ori['YYYY'].astype(str) + df_ori['YYYYMMDD'].astype(str)
m = df_ori['YYYYMMDD'].astype(str).str.len() == 4
df_ori['YYYYMMDD'] = df_ori['YYYYMMDD'].mask(m, a)
print (df_ori)
   YYYY  MMDD  YYYYMMDD
0  2016   525  20160525
1  2016   728  20160728
2  2014  1121  20141121
3  2016  1211  20162211
4  2016   422  20160422

Upvotes: 1

Related Questions