Reputation: 57
I am trying to import multiple files between two dates into a Pandas DataFrame. But the resulting dataframe has multiple copys of the data instead of one copy.
My code looks like this:
Mu = pd.DataFrame()
lis = []
for date in daterange:
path = 'Z:/directory/to/files' + date + '.txt'
frame = pd.read_csv(path,delimiter=' ', skipinitialspace=True,usecols=[0,1,2,3],
names = ['date','time','type1','type2'],
parse_dates = {'timestamp': ['date','time']})
lis.append(frame)
Mu = pd.concat(lis, axis =0, ignore_index = True)
If I have files like this:
File A:
20170501 00:00:11 11 1
20170501 00:00:20 21 2
File B:
20170502 00:06:11 31 3
20170502 00:30:11 41 4
File C:
20170503 00:40:11 51 5
20170503 00:50:11 61 6
The resulting dataframe looks like this:
20170501 00:00:11 11 1
20170501 00:00:20 21 2
20170502 00:06:11 31 3
20170502 00:30:11 41 4
20170503 00:40:11 51 5
20170503 00:50:11 61 6
20170501 00:00:11 11 1
20170501 00:00:20 21 2
20170502 00:06:11 31 3
20170502 00:30:11 41 4
20170503 00:40:11 51 5
20170503 00:50:11 61 6
20170501 00:00:11 11 1
20170501 00:00:20 21 2
20170502 00:06:11 31 3
20170502 00:30:11 41 4
20170503 00:40:11 51 5
20170503 00:50:11 61 6
What I want is this:
20170501 00:00:11 11 1
20170501 00:00:20 21 2
20170502 00:06:11 31 3
20170502 00:30:11 41 4
20170503 00:40:11 51 5
20170503 00:50:11 61 6
How can I create the wanted dataframe?
Upvotes: 1
Views: 68
Reputation: 1003
You can use drop_duplicates:
Mu = Mu.drop_duplicates()
output :
0 20170501 00:00:11 11 1
1 20170501 00:00:20 21 2
2 20170502 00:06:11 31 3
3 20170502 00:30:11 41 4
4 20170503 00:40:11 51 5
5 20170503 00:50:11 61 6
Upvotes: 3