The Dynamic duo of Python & SQL
Why a need to connect SQL & Python?
In today’s data-driven world, information is power. In the ever-expanding universe of data, the ability to extract meaningful insights has become the North Star guiding businesses, organizations, and individuals toward informed decision-making.
Python is a general-purpose programming language that is known for its simplicity and readability. Python is also very versatile, and it can be used for a wide variety of tasks, including data analysis, machine learning, and web development.
SQL stands for Structured Query Language. It is a declarative language, which means that you tell the database what you want to do, but not how to do it. SQL is a very powerful language, and it can be used to perform complex queries on large datasets.
Python, celebrated for its versatility and simplicity, has firmly established itself as a beloved tool in the arsenal of data analysts and scientists. But what makes Python truly shine in the world of data is its symbiotic relationship with Structured Query Language (SQL), the key that unlocks the doors to vast data repositories.
Connecting Python to SQL databases allows you to leverage the power of Python for data analysis, manipulation, and visualization while using SQL to interact with and manage your data in a database.
There are various ways to establish connectivity between Python and SQL databases, with the most common methods being:
Using Database Libraries:
- SQLite3 (built-in): Python comes with a built-in SQLite3 library, which allows you to interact with SQLite databases without any additional installations.
- Third-party Libraries: For other relational databases like MySQL, PostgreSQL, Oracle, or SQL Server, you can use third-party libraries like:
mysql-connector-python
orpymysql
for MySQL.psycopg2
for PostgreSQL.cx_Oracle
for Oracle.pyodbc
for SQL Server.
These libraries provide Python functions and classes to establish connections, execute SQL queries, and fetch results from databases.
Connecting to the Database
Connecting to the SQLite Database can be established using the connect() method, passing the name of the database to be accessed as a parameter. If that database does not exist, then it’ll be created.
sqliteConnection = sqlite3.connect('sql.db')
But what if you want to execute some queries after the connection is made? For that, a cursor has to be created using the cursor() method on the connection instance, which will execute our SQL queries.
cursor = sqliteConnection.cursor()
Creating an SQLite engine:
engine = create_engine("sqlite:///sql.db")
PostgreSQL
The PostgreSQL dialect uses psycopg2 as the default DBAPI. Other PostgreSQL DBAPIs include pg8000:
# default
engine = create_engine("postgresql://scott:tiger@localhost/mydatabase")
# psycopg2
engine = create_engine("postgresql+psycopg2://scott:tiger@localhost/mydatabase")# pg8000
engine = create_engine("postgresql+pg8000://scott:tiger@localhost/mydatabase")
MySQL
The MySQL dialect uses MySQL client as the default DBAPI. There are other MySQL DBAPIs available, including PyMySQL:
# default
engine = create_engine("mysql://root:password@localhost/database_name")
# mysqlclient (a maintained fork of MySQL-Python)
engine = create_engine("mysql+mysqldb://root:password@localhost/db_name")# PyMySQL
engine = create_engine("mysql+pymysql://root:password@localhost/db_name")
Oracle
The Oracle dialect uses cx_oracle as the default DBAPI:
engine = create_engine("oracle://root:password@127.0.0.1:1521/db_name")
engine = create_engine("oracle+cx_oracle://root:password@tnsname")
Microsoft SQL Server
The SQL Server dialect uses pyodbc as the default DBAPI. pymssql is also available:
# pyodbc
engine = create_engine("mssql+pyodbc://root:password@mydsn")
# pymssql
engine = create_engine("mssql+pymssql://root:password@hostname:port/dbname")
Pandas function for effective & easy working of SQL queries:
SQL and pandas both have a place in a functional data analysis tech stack, and today we’re going to look at how to use them both together most effectively. We suggested doing the really heavy lifting directly in the database instance via SQL, then doing the finer-grained data analysis on your local machine using pandas — but we didn’t actually go into how you could do that.
pd.read_sql: To read SQL table into a DataFrame using only the table name, without executing any query we use the read_sql_table() method in Pandas. This function does not support DBAPI connections.
pandas.read_sql_table(table_name, con, schema=None, index_col=None, coerce_float=True, parse_dates=None, columns=None, chunksize=None)
pd.read_sql: Returns a DataFrame corresponding to the result set of the query string.
pandas.read_sql_query(sql_query, con, index_col=None, coerce_float=True, params=None, parse_dates=None, chunksize=None, dtype=None, dtype_backend=_NoDefault.no_default)
data.to_sql: The to_sql() function is used to write records stored in a DataFrame to a SQL database.
DataFrame.to_sql(name, con, *, schema=None, if_exists=’fail’, index=True, index_label=None, chunksize=None, dtype=None, method=None)
pd.read_sql: pandas read_sql()
the function is used to read SQL queries or database tables into DataFrame. This is a wrapper read_sql_query()
and read_sql_table()
functions, based on the input it calls these functions internally and returns the SQL table as a two-dimensional data structure with labeled axes.
pandas.read_sql(sql_query_or_table, con, index_col=None, coerce_float=True, params=None, parse_dates=None, columns=None, chunksize=None, dtype_backend=_NoDefault.no_default, dtype=None)