Reputation: 35
I currently have a pandas data frame that reads like this:
0 (dev_id='A', accon_time='B', start_time='C',end_time='D')
1 (dev_id='E', accon_time='F', start_time='G',end_time='H')
2 (dev_id='I', accon_time='J', start_time='K',end_time='L')
The current shape of this data frame is (574,1) when I actually want it to be (574,4) where each of the 4 comma separated values in each row are actually split between 4 separate columns.
Is there any way to do this?
I've tried converting my query into a pandas series first and then using Series.str.split however the results are the same as the original data frame.
ser = pd.Series(qry)
ser.str.rsplit(pat=",", n=4, expand=True)
print(ser)
df = pd.DataFrame(data=ser)
print(df)
This is what i'm using to query my data:
class Trip(Base):
__tablename__ = 'trip'
dev_id = Column(String(50), primary_key=True)
accon_time = Column(Integer)
start_time = Column(Integer)
end_time = Column(Integer)
def __repr__(self):
return "(dev_id='%s', accon_time='%s', start_time='%s',end_time='%s')"
% (self.dev_id, self.accon_time, self.start_time, self.end_time)
qry = session.query(Trip).\
filter(Trip.accon_time.between(20190620000000, 20190621000000)).\
filter(Trip.start_time <= 20190620145813).\
filter(Trip.end_time <= 20190620151400).\
filter(Trip.end_time >= 20190620145600)
This returns a list like this:
(dev_id='A', accon_time='B', start_time='C',end_time='D'),(dev_id='E', accon_time='F', start_time='G',end_time='H'),(dev_id='I', accon_time='J', start_time='K',end_time='L')
Converting my query results into a pandas data frame
df = pd.DataFrame(data=qry)
print(df)
Upvotes: 0
Views: 208
Reputation: 1677
In your parsing example ser.str.rsplit(pat=",", n=4, expand=True)
returns the output of ser, you need to catch the output or it wont do anything
Try this for the parsing:
qry = ["(dev_id='A', accon_time='B', start_time='C',end_time='D')",
"(dev_id='E', accon_time='F', start_time='G',end_time='H')",
"(dev_id='I', accon_time='J', start_time='K',end_time='L')"]
ser = pd.Series(qry)
df = ser.apply(lambda x: pd.Series([val.split('=')[1] for val in x[1:-1].split(',')]))
df.columns = ['dev_id', 'accon_time', 'start_time', 'end_time']
for each row of ser .appy()
I take the string and remove the paranteses x[1:-1]
then split on the commas .split(',')
this will give me a list of key value literals (i.e. ["dev_id='A'", " accon_time='B'", " start_time='C'", "end_time='D'"]
). Then for each literal I split it on '=' and return the second element which is the actual value .split('=')[1]
.
if you don't want the "'
" in your elements strip it at the end with .strip('\'')
ser = ser.apply(lambda x:[val.split('=')[1].strip('\'') for val in x[1:-1].split(',')])
output:
dev_id accon_time start_time end_time
0 'A' 'B' 'C' 'D'
1 'E' 'F' 'G' 'H'
2 'I' 'J' 'K' 'L'
Upvotes: 1