No description
  • Python 77%
  • Thrift 21.3%
  • Shell 1.7%
Find a file
Timothy Hopper 94a8eff9cd Pin Python thrift at <= 0.9.3 (#246)
thrift 0.10.0 package has a breaking change. https://github.com/cloudera/impyla/issues/235
2017-02-13 13:36:53 -05:00
bin [IMPYLA-65] Add Kerberos support for WebHDFS 2015-04-07 14:44:06 -07:00
dev DEV: add merge-pr.py apache tool 2015-11-09 13:57:38 -08:00
impala DEV: add versioneer 2016-11-27 20:14:03 -05:00
jenkins Use PyPI SASL in jenkins 2016-02-16 09:19:39 -08:00
.coveragerc MAINT: Further package simplification with beeswax removal 2015-10-30 17:42:57 -07:00
.gitattributes DEV: add versioneer 2016-11-27 20:14:03 -05:00
.gitignore Added code coverage reporting to codecov.io 2015-08-31 17:21:59 -07:00
.landscape.yaml TST: fix prospector issues 2015-11-10 11:55:54 -08:00
DEVELOP.md README updates 2015-09-04 12:56:44 -07:00
ez_setup.py Manually fixed some auto pep8 style choices 2014-12-05 17:03:06 -05:00
LICENSE.txt Added license boilerplate everywhere 2013-11-12 18:37:19 -08:00
MANIFEST.in DEV: add versioneer 2016-11-27 20:14:03 -05:00
README.md Update versions and bump to 0.13.2 release 2016-02-16 09:04:24 -08:00
setup.cfg DEV: add versioneer 2016-11-27 20:14:03 -05:00
setup.py Pin Python thrift at <= 0.9.3 (#246) 2017-02-13 13:36:53 -05:00
versioneer.py DEV: add versioneer 2016-11-27 20:14:03 -05:00

impyla

Python client for HiveServer2 implementations (e.g., Impala, Hive) for distributed query engines.

For higher-level Impala functionality, including a Pandas-like interface over distributed data sets, see the Ibis project.

Features

  • HiveServer2 compliant; works with Impala and Hive, including nested data

  • Fully DB API 2.0 (PEP 249)-compliant Python client (similar to sqlite or MySQL clients) supporting Python 2.6+ and Python 3.3+.

  • Works with Kerberos, LDAP, SSL

  • SQLAlchemy connector

  • Converter to pandas DataFrame, allowing easy integration into the Python data stack (including scikit-learn and matplotlib); but see the Ibis project for a richer experience

Dependencies

Required:

  • Python 2.6+ or 3.3+

  • six, bit_array

  • thrift (on Python 2.x) or thriftpy (on Python 3.x)

For Hive and/or Kerberos support:

pip install thrift_sasl
pip install sasl

Optional:

  • pandas for conversion to DataFrame objects; but see the Ibis project instead

  • sqlalchemy for the SQLAlchemy engine

  • pytest for running tests; unittest2 for testing on Python 2.6

Installation

Install the latest release (0.13.1) with pip:

pip install impyla

For the latest (dev) version, install directly from the repo:

pip install git+https://github.com/cloudera/impyla.git

or clone the repo:

git clone https://github.com/cloudera/impyla.git
cd impyla
python setup.py install

Running the tests

impyla uses the pytest toolchain, and depends on the following environment variables:

export IMPYLA_TEST_HOST=your.impalad.com
export IMPYLA_TEST_PORT=21050
export IMPYLA_TEST_AUTH_MECH=NOSASL

To run the maximal set of tests, run

cd path/to/impyla
py.test --connect impyla

Leave out the --connect option to skip tests for DB API compliance.

Usage

Impyla implements the Python DB API v2.0 (PEP 249) database interface (refer to it for API details):

from impala.dbapi import connect
conn = connect(host='my.host.com', port=21050)
cursor = conn.cursor()
cursor.execute('SELECT * FROM mytable LIMIT 100')
print cursor.description  # prints the result set's schema
results = cursor.fetchall()

The Cursor object also exposes the iterator interface, which is buffered (controlled by cursor.arraysize):

cursor.execute('SELECT * FROM mytable LIMIT 100')
for row in cursor:
    process(row)

You can also get back a pandas DataFrame object

from impala.util import as_pandas
df = as_pandas(cur)
# carry df through scikit-learn, for example