Call graph construction is the foundation of inter-procedural static analysis. PyCG is the state-of-the-art approach for constructing call graphs for Python programs. Unfortunately, PyCG does not scale to large programs when adapted to whole-program analysis where application and dependent libraries are both analyzed. Moreover, PyCG is flow-insensitive and does not fully support Python’s features, hindering its accuracy.
To overcome these drawbacks, we propose a scalable and precise approach for constructing application-centered call graphs for Python programs, and implement it as a prototype tool JARVIS. JARVIS maintains a type graph (i.e., type relations of program identifiers) for each function in a program to allow type inference. Taking one function as an input, JARVIS generates the call graph on-the-fly, where flow-sensitive intra-procedural analysis and inter-procedural analysis are conducted in turn and strong updates are conducted. Our evaluation on a micro-benchmark of 135 small Python programs and a macro-benchmark of 6 real- world Python applications has demonstrated that JARVIS can significantly improve PYCG by at least 67% faster in time, 84% higher in precision, and at least 20% higher in recall.
The paper has been submitted to ICSE 2025. The Jarvis artifact is provided here.
The micro-benchmark and macro-benchmark are provided in dataset
and grount_truth
directory.
Prerequisites:
run jarvis_cli.py
.
Jarvis usage:
$ python3 tool/Jarvis/jarvis_cli.py [module_path1 module_path2 module_path3...] [--package] [--decy] [-o output_path]
Jarvis help:
$ python3 tool/Jarvis/jarvis_cli.py -h
usage: jarvis_cli.py [-h] [--package PACKAGE] [--decy] [--precision]
[--moduleEntry [MODULEENTRY ...]]
[--operation {call-graph,key-error}] [-o OUTPUT]
[module ...]
positional arguments:
module modules to be processed, which are also application entries in A.W. mode
options:
-h, --help show this help message and exit
--package PACKAGE Package containing the code to be analyzed
--decy whether analyze the dependencies
--precision whether flow-sensitive
--entry-point [MODULEENTRY ...]
Entry functions to be processed
-o OUTPUT, --output OUTPUT
Output call graph path
Example 1: analyze bpytop.py in E.A. mode.
$ python3 tool/Jarvis/jarvis_cli.py dataset/macro_benchmark/pj/bpytop/bpytop.py --package dataset/macro_benchmark/pj/bpytop -o jarvis.json
Example 2: analyze bpytop.py in A.W. mode. Note we should prepare all the dependencies in the virtual environment.
# create virtualenv environment
$ virtualenv venv python=python3.8
# install Dependencies in virtualenv environment
$ python3 -m pip install psutil
# run jarvis
$ python3 tool/Jarvis/jarvis_cli.py dataset/macro_benchmark/pj/bpytop/bpytop.py --package dataset/macro_benchmark/pj/bpytop --decy -o jarvis.json
cd to the root directory of the unzipped files.
# 1. run micro_benchmark
$ ./reproducing_RQ12_setup/micro_benchmark/test_All.sh
# 2. run macro_benchmark
$ ./reproducing_RQ12_setup/macro_benchmark/pycg_EA.sh
# PyCG iterates once
$ ./reproducing_RQ12_setup/macro_benchmark/pycg_EW.sh 1
# PyCG iterates twice
$ ./reproducing_RQ12_setup/macro_benchmark/pycg_EW.sh 2
# PyCG iterates to convergence
$ ./reproducing_RQ12_setup/macro_benchmark/pycg_EW.sh
$ ./reproducing_RQ12_setup/macro_benchmark/jarvis_AA.sh
$ ./reproducing_RQ12_setup/macro_benchmark/jarvis_EA.sh
$ ./reproducing_RQ12_setup/macro_benchmark/jarvis_AW.sh
Run
$ python3 ./reproducing_RQ1/gen_table.py
The results are shown below:
Run
$ pip3 install matplotlib
$ pip3 install numpy
$ python3 ./reproducing_RQ1/FTG/plot.py
The generated graphs are pycg-ag.pdf
, pycg-change-ag.pdf
and jarvis-ftg.pdf
, where they represents Fig. 9a, Fig. 9b and Fig 10, correspondingly.
Run
$ python3 ./reproducing_RQ2/gen_table.py
The generated results:
Scalability results (RQ1), AE denotes AssertionError:
Accuracy results (RQ2):
The 43 python projects out of the top 200 Highly-starred projects are listed in file
Fastapi, Httpie, Scrapy, Lightning, Airflow,sherlock,wagtail
The CVEs of html , numpy , lxml,psutil don’t relate to Python , we don’t care them.
- sherlock.sherlock
- requests(v2.28.0)
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- sherlock.sites
- requests(v.2.28.0)
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- airflow.kubernetes.kube_client
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- airflow.providers.cncf.kubernetes.operators.pod
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- airflow.providers.cncf.kubernetes.utils.pot_manager
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- airflow.executors.kubernetes_executor
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
......
- wagtail.contrib.frontent_cache.backends
- requests(v2.28.0)
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- httpie.client
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- httpie.ssl_
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- httpie.models
- urllib3(1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- scrapy.downloadermiddlewares.cookies
- tldextract(v3.4.4)
- requests(v2.28.0)
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- lightning.app.utilities.network
- requests(v2.28.0)
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- lightning.app.utilities.network
- requests(v2.28.0)
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
- lightning.app.utilities.network
- requests(v2.28.0)
- urllib3(v1.26.0) ---- [CVE-2021-33503,CVE-2019-11324,CVE-2019-11236,CVE-2020-7212]
...
According to the patch commit, the vulnerable method of CVE-2021-33503 in urllib3 is urllib3.util.url
.
Below is the method-level invocation path:
- httpie.apapters.<main>
- requests.adapters.<main>
- urllib3.contrib.socks.<main>
- Urllib3.util.url.<main> ---- CVE-2021-33503
- scrapy.downloadermiddlewares.cookies.<main>
- tldextract.__init__.<main>
- tldextract.tldextract.<main>
- tldextract.suffix_list.<main>
- requests_file.<main>
- requests.adapters.<main>
- Urllib3.util.url.<main> ---- CVE-2021-33503
- lightning.app.utilities.network.<main>
- requests.adapters.<main>
- urllib3.contrib.socks.<main>
- Urllib3.util.url.<main> ---- CVE-2021-33503
- airflow.providers.amazon.aws.hooks.base_aws.BaseSessionFactory._get_idp_response
- requests.adapters.<main>
- urllib3.contrib.sock.<main>
- urllib3.util.url.<main> ---- CVE-2021-33503
PS: <main> represents body code block of python file.(Because python doesn’t need entry function)
Our artifact has reused part of the functionalities from third party libraries. i.e., PyCG.
Vitalis Salis et al. PyCG: Practical Call Graph Generation in Python. In 43rd International Conference on Software Engineering (ICSE), 25–28 May 2021.