Reputation: 1150
I'm in the process of concatenating two related fields throughout a large dataset. I feel like I have most of what I need but can't concat the fields properly.
dataframe:
id| date1foo| time1bar| date2foo| time2bar| date3foo | time3bar
--|---------|---------|---------|---------|----------|--------
2 |1/4/2017 |01:03:45 |1/4/2017 |01:03:45 |1/4/2019 |12:44:45
3 |2/4/2017 |03:12:32 |2/4/2017 |03:16:23 |3/4/2019 |22:32:55
4 |2/5/2017 |04:11:54 |7/5/2017 |06:23:31 |2/19/2019 |19:03:11
5 |2/6/2017 |02:15:34 |9/15/2017|01:12:32 |3/15/2019 |11:11:11
6 |3/17/2017|04:44:12 |10/3/2017|07:19:52 |4/4/2019 |07:03:14
I need to replace these fields with new merged fields. So:
id| datetime1 | datetime2 | datetime3
--|------------------|------------------|------------------|
2 |1/4/2017 01:03:45 |1/4/2017 01:03:45 |1/4/2019 12:44:45
3 |2/4/2017 03:12:32 |2/4/2017 03:16:23 |3/4/2019 22:32:55
4 |2/5/2017 04:11:54 |7/5/2017 06:23:31 |2/19/2019 19:03:11
5 |2/6/2017 02:15:34 |9/15/2017 01:12:32|3/15/2019 11:11:11
6 |3/17/2017 04:44:12|10/3/2017 07:19:52|4/4/2019 07:03:14
I feel like I'm getting close with what I have below.
code:
pattern_date = re.compile("date[0-9]{1,2}foo")
pattern_time = re.compile("time[0-9]{1,2}bar")
cols_date = [pattern_date.match(x).group() for x in df.columns if
pattern_date.match(x) is not None]
cols_time = [pattern_time.match(x).group() for x in df.columns if
pattern_time.match(x) is not None]
df[cols_time] = df[cols_date].applymap(lambda x: str(x) + [i for i in df[cols_date]])
# renaming fields code would go here
What am I missing here? Is there a better way to do this? Any help would be much appreciated.
Thanks in advance!
Upvotes: 0
Views: 36
Reputation: 42916
We can use DatFrame.filter
and acces these columns by zipping them so we match date
and time
:
df_new = pd.DataFrame({'id':df.id.values})
for index, cols in enumerate(zip(df.filter(regex='^date').columns, df.filter(regex='^time').columns)):
df_new[f'datetime{index+1}'] = df[cols[0]] + ' ' + df[cols[1]]
print(df_new)
id datetime1 datetime2 datetime3
0 2 1/4/2017 01:03:45 1/4/2017 01:03:45 1/4/2019 12:44:45
1 3 2/4/2017 03:12:32 2/4/2017 03:16:23 3/4/2019 22:32:55
2 4 2/5/2017 04:11:54 7/5/2017 06:23:31 2/19/2019 19:03:11
3 5 2/6/2017 02:15:34 9/15/2017 01:12:32 3/15/2019 11:11:11
4 6 3/17/2017 04:44:12 10/3/2017 07:19:52 4/4/2019 07:03:14
What does DataFrame.filter
do exactly? It gives the columns back which match the regex:
print(df.filter(regex='^date'))
date1foo date2foo date3foo
0 1/4/2017 1/4/2017 1/4/2019
1 2/4/2017 2/4/2017 3/4/2019
2 2/5/2017 7/5/2017 2/19/2019
3 2/6/2017 9/15/2017 3/15/2019
4 3/17/2017 10/3/2017 4/4/2019
print(df.filter(regex='^time'))
time1bar time2bar time3bar
0 01:03:45 01:03:45 12:44:45
1 03:12:32 03:16:23 22:32:55
2 04:11:54 06:23:31 19:03:11
3 02:15:34 01:12:32 11:11:11
4 04:44:12 07:19:52 07:03:14
Note I used f-strings
which is only supported with Python > 3.5. If your Python version is lower, use the following:
df_new['datetime{}'.format(index+1)]
Upvotes: 1