Concatenate related fields and replace within data frame

Question

I'm in the process of concatenating two related fields throughout a large dataset. I feel like I have most of what I need but can't concat the fields properly.

dataframe:

id| date1foo| time1bar| date2foo| time2bar| date3foo | time3bar
--|---------|---------|---------|---------|----------|--------
2 |1/4/2017 |01:03:45 |1/4/2017 |01:03:45 |1/4/2019  |12:44:45
3 |2/4/2017 |03:12:32 |2/4/2017 |03:16:23 |3/4/2019  |22:32:55
4 |2/5/2017 |04:11:54 |7/5/2017 |06:23:31 |2/19/2019 |19:03:11
5 |2/6/2017 |02:15:34 |9/15/2017|01:12:32 |3/15/2019 |11:11:11
6 |3/17/2017|04:44:12 |10/3/2017|07:19:52 |4/4/2019  |07:03:14

I need to replace these fields with new merged fields. So:

id| datetime1        | datetime2        | datetime3   
--|------------------|------------------|------------------|
2 |1/4/2017 01:03:45 |1/4/2017 01:03:45 |1/4/2019 12:44:45
3 |2/4/2017 03:12:32 |2/4/2017 03:16:23 |3/4/2019 22:32:55
4 |2/5/2017 04:11:54 |7/5/2017 06:23:31 |2/19/2019 19:03:11
5 |2/6/2017 02:15:34 |9/15/2017 01:12:32|3/15/2019 11:11:11
6 |3/17/2017 04:44:12|10/3/2017 07:19:52|4/4/2019 07:03:14

I feel like I'm getting close with what I have below.

code:

pattern_date = re.compile("date[0-9]{1,2}foo")
pattern_time = re.compile("time[0-9]{1,2}bar")

cols_date = [pattern_date.match(x).group() for x in df.columns if
        pattern_date.match(x) is not None]

cols_time = [pattern_time.match(x).group() for x in df.columns if
        pattern_time.match(x) is not None]


df[cols_time] = df[cols_date].applymap(lambda x: str(x) + [i for i in df[cols_date]])

# renaming fields code would go here

What am I missing here? Is there a better way to do this? Any help would be much appreciated.

Thanks in advance!

Erfan · Accepted Answer

We can use DatFrame.filter and acces these columns by zipping them so we match date and time:

df_new = pd.DataFrame({'id':df.id.values})

for index, cols in enumerate(zip(df.filter(regex='^date').columns, df.filter(regex='^time').columns)):
    df_new[f'datetime{index+1}'] = df[cols[0]] + ' ' + df[cols[1]]

print(df_new)
   id           datetime1           datetime2           datetime3
0   2   1/4/2017 01:03:45   1/4/2017 01:03:45   1/4/2019 12:44:45
1   3   2/4/2017 03:12:32   2/4/2017 03:16:23   3/4/2019 22:32:55
2   4   2/5/2017 04:11:54   7/5/2017 06:23:31  2/19/2019 19:03:11
3   5   2/6/2017 02:15:34  9/15/2017 01:12:32  3/15/2019 11:11:11
4   6  3/17/2017 04:44:12  10/3/2017 07:19:52   4/4/2019 07:03:14

What does DataFrame.filter do exactly? It gives the columns back which match the regex:

print(df.filter(regex='^date'))
    date1foo   date2foo   date3foo
0   1/4/2017   1/4/2017   1/4/2019
1   2/4/2017   2/4/2017   3/4/2019
2   2/5/2017   7/5/2017  2/19/2019
3   2/6/2017  9/15/2017  3/15/2019
4  3/17/2017  10/3/2017   4/4/2019

print(df.filter(regex='^time'))

   time1bar  time2bar  time3bar
0  01:03:45  01:03:45  12:44:45
1  03:12:32  03:16:23  22:32:55
2  04:11:54  06:23:31  19:03:11
3  02:15:34  01:12:32  11:11:11
4  04:44:12  07:19:52  07:03:14

Note I used f-strings which is only supported with Python > 3.5. If your Python version is lower, use the following:

df_new['datetime{}'.format(index+1)]

Concatenate related fields and replace within data frame

Answers (1)

Related Questions