Reputation: 27
When I try to take an SQL Query generated from a pd.read_sql_query
to a dataframe using pd.DataFrame
my string values get converted to nan
.
I tried using dtypes to set the type of each column
SQL_Query = pd.read_sql_query('''SELECT [CircuitID], [Status],
[LatestJiraTicket], [MrcNew]
FROM CircuitInfoTable
WHERE ([Status] = 'Active')
OR ([Status] = 'Pending')
OR ([Status] = 'Planned')''', conn)
# print(SQL_Query)
cdf = pd.DataFrame(SQL_Query, columns=['CID', 'Status', 'JiraTicket', 'MrcNew'])
SQL Query output:
0 OH1004-01 ... NaN
1 OH1004-02 ... NaN
2 OH1005-01 ... NaN
3 OH1005-02 ... NaN
4 AL1001-01 ... NaN
5 AL1001-02 ... NaN
6 AL1007-01 ... NaN
7 AL1007-02 ... NaN
8 NC1001-01 ... NaN
9 NC1001-02 ... NaN
10 NC1001-03 ... NaN
11 NC1001-04 ... NaN
12 NC1001-05 ... NaN
13 NC1001-06 ... NaN
14 (ommited on purpose) ... 5200.0
15 MO001-02 ... NaN
16 OR020-01 ... 8000.0
17 MA004-01 ... 6500.0
18 MA004-02 ... 6500.0
19 OR004-01 ... 10500.0
20 (ommited on purpose) ... 3975.0
21 OR007-01 ... 2500.0
22 (ommited on purpose) ... 9200.0
23 (ommited on purpose) ... 15000.0
24 (ommited on purpose) ... 5750.0
25 CA1005-02 ... 47400.0
26 CA1005-03 ... 47400.0
27 CA1005-04 ... 47400.0
28 CA1005-05 ... 47400.0
29 CA1006-01 ... 0.0
DataFrame output:
CID Status JiraTicket MrcNew
0 nan Planned nan NaN
1 nan Planned nan NaN
2 nan Planned nan NaN
3 nan Planned nan NaN
4 nan Planned nan NaN
5 nan Planned nan NaN
6 nan Planned nan NaN
7 nan Planned nan NaN
8 nan Planned nan NaN
9 nan Planned nan NaN
10 nan Planned nan NaN
11 nan Planned nan NaN
12 nan Planned nan NaN
13 nan Planned nan NaN
14 nan Active nan 5200.0
15 nan Pending nan NaN
16 nan Pending nan 8000.0
17 nan Pending nan 6500.0
18 nan Pending nan 6500.0
19 nan Pending nan 10500.0
20 nan Active nan 3975.0
21 nan Pending nan 2500.0
22 nan Active nan 9200.0
23 nan Pending nan 15000.0
24 nan Active nan 5750.0
25 nan Pending nan 47400.0
26 nan Pending nan 47400.0
27 nan Pending nan 47400.0
28 nan Pending nan 47400.0
29 nan Pending nan 0.0
Upvotes: 0
Views: 496
Reputation: 107737
Basically, you are using columns argument incorrectly in pandas.DataFrame
where that arugment specifies columns to select in resulting output (not to rename). From your query there is no CID or JiraTicket and hence they migrate with all missing values.
Possibly you intended to rename columns. Consider renaming in either SQL with column aliases or in pandas with rename
or set_axis
:
SELECT [CircuitID] AS [CID],
[Status],
[LatestJiraTicket] AS JiraTicket,
[MrcNew]
FROM CircuitInfoTable
WHERE ([Status] = 'Active')
OR ([Status] = 'Pending')
OR ([Status] = 'Planned')
Pandas
cdf = (pd.read_sql_query(...original query...)
.rename(columns={'CircuitID': 'CID', 'LatestJiraTicket': 'JiraTicket'})
)
cdf = (pd.read_sql_query(...original query...)
.set_axis(['CID', 'Status', 'JiraTicket', 'MrcNew'], axis='columns', inplace=False)
)
Upvotes: 1