Patrick Carra
Patrick Carra

Reputation: 27

Converting pd.read_sql_query to pd.DataFrame converts strings to nan

When I try to take an SQL Query generated from a pd.read_sql_query to a dataframe using pd.DataFrame my string values get converted to nan.

I tried using dtypes to set the type of each column

SQL_Query = pd.read_sql_query('''SELECT [CircuitID], [Status], 
                                        [LatestJiraTicket], [MrcNew] 
                                  FROM CircuitInfoTable 
                                  WHERE ([Status] = 'Active') 
                                     OR ([Status] = 'Pending')
                                     OR ([Status] = 'Planned')''', conn)
# print(SQL_Query)
cdf = pd.DataFrame(SQL_Query, columns=['CID', 'Status', 'JiraTicket', 'MrcNew'])

SQL Query output:

0                                      OH1004-01  ...      NaN
1                                      OH1004-02  ...      NaN
2                                      OH1005-01  ...      NaN
3                                      OH1005-02  ...      NaN
4                                      AL1001-01  ...      NaN
5                                      AL1001-02  ...      NaN
6                                      AL1007-01  ...      NaN
7                                      AL1007-02  ...      NaN
8                                      NC1001-01  ...      NaN
9                                      NC1001-02  ...      NaN
10                                     NC1001-03  ...      NaN
11                                     NC1001-04  ...      NaN
12                                     NC1001-05  ...      NaN
13                                     NC1001-06  ...      NaN
14                          (ommited on purpose)  ...   5200.0
15                                      MO001-02  ...      NaN
16                                      OR020-01  ...   8000.0
17                                      MA004-01  ...   6500.0
18                                      MA004-02  ...   6500.0
19                                      OR004-01  ...  10500.0
20                          (ommited on purpose)  ...   3975.0
21                                      OR007-01  ...   2500.0
22                          (ommited on purpose)  ...   9200.0
23                          (ommited on purpose)  ...  15000.0
24                          (ommited on purpose)  ...   5750.0
25                                     CA1005-02  ...  47400.0
26                                     CA1005-03  ...  47400.0
27                                     CA1005-04  ...  47400.0
28                                     CA1005-05  ...  47400.0
29                                     CA1006-01  ...      0.0

DataFrame output:

     CID   Status JiraTicket   MrcNew
0    nan  Planned        nan      NaN
1    nan  Planned        nan      NaN
2    nan  Planned        nan      NaN
3    nan  Planned        nan      NaN
4    nan  Planned        nan      NaN
5    nan  Planned        nan      NaN
6    nan  Planned        nan      NaN
7    nan  Planned        nan      NaN
8    nan  Planned        nan      NaN
9    nan  Planned        nan      NaN
10   nan  Planned        nan      NaN
11   nan  Planned        nan      NaN
12   nan  Planned        nan      NaN
13   nan  Planned        nan      NaN
14   nan   Active        nan   5200.0
15   nan  Pending        nan      NaN
16   nan  Pending        nan   8000.0
17   nan  Pending        nan   6500.0
18   nan  Pending        nan   6500.0
19   nan  Pending        nan  10500.0
20   nan   Active        nan   3975.0
21   nan  Pending        nan   2500.0
22   nan   Active        nan   9200.0
23   nan  Pending        nan  15000.0
24   nan   Active        nan   5750.0
25   nan  Pending        nan  47400.0
26   nan  Pending        nan  47400.0
27   nan  Pending        nan  47400.0
28   nan  Pending        nan  47400.0
29   nan  Pending        nan      0.0

Upvotes: 0

Views: 496

Answers (1)

Parfait
Parfait

Reputation: 107737

Basically, you are using columns argument incorrectly in pandas.DataFrame where that arugment specifies columns to select in resulting output (not to rename). From your query there is no CID or JiraTicket and hence they migrate with all missing values.

Possibly you intended to rename columns. Consider renaming in either SQL with column aliases or in pandas with rename or set_axis:

SELECT [CircuitID] AS [CID], 
       [Status], 
       [LatestJiraTicket] AS JiraTicket, 
       [MrcNew] 
FROM CircuitInfoTable 
WHERE ([Status] = 'Active') 
   OR ([Status] = 'Pending')
   OR ([Status] = 'Planned')

Pandas

cdf = (pd.read_sql_query(...original query...)
         .rename(columns={'CircuitID': 'CID', 'LatestJiraTicket': 'JiraTicket'})
      )

cdf = (pd.read_sql_query(...original query...)
         .set_axis(['CID', 'Status', 'JiraTicket', 'MrcNew'], axis='columns', inplace=False)
      )

Upvotes: 1

Related Questions