We propose our new Android malware dataset here, named CICAndMal2017. In this approach, we run our both malware and benign applications on real smartphones to avoid runtime behavior modification of advanced malware samples that are able to detect the emulator environment. We collected more than 10,854 samples (4,354 malware and 6,500 benign) from several sources. We have collected over six thousand benign apps from Googleplay market published in 2015, 2016, 2017.
We installed 5,000 of the collected samples (426 malware and 5,065 benign) on real devices. Our malware samples in the CICAndMal2017 dataset are classified into four categories:
- Adware
- Ransomware
- Scareware
- SMS Malware
Our samples come from 42 unique malware families. The family kinds of each category and the numbers of the captured samples are as follows:
Adware:
Dowgin family, 10 captured samples
Ewind family, 10 captured samples
Feiwo family, 15 captured samples
Gooligan family, 14 captured samples
Kemoge family, 11 captured samples
koodous family, 10 captured samples
Mobidash family, 10 captured samples
Selfmite family, 4 captured samples
Shuanet family, 10 captured samples
Youmi family, 10 captured samples
Ransomware:
Charger family, 10 captured samples
Jisut family, 10 captured samples
Koler family, 10 captured samples
LockerPin family, 10 captured samples
Simplocker family, 10 captured samples
Pletor family, 10 captured samples
PornDroid family, 10 captured samples
RansomBO family, 10 captured samples
Svpeng family, 11 captured samples
WannaLocker family, 10 captured samples
Scareware:
AndroidDefender 17 captured samples
AndroidSpy.277 family, 6 captured samples
AV for Android family, 10 captured samples
AVpass family, 10 captured samples
FakeApp family, 10 captured samples
FakeApp.AL family, 11 captured samples
FakeAV family, 10 captured samples
FakeJobOffer family, 9 captured samples
FakeTaoBao family, 9 captured samples
Penetho family, 10 captured samples
VirusShield family, 10 captured samples
SMS Malware:
BeanBot family, 9 captured samples
Biige family, 11 captured samples
FakeInst family, 10 captured samples
FakeMart family, 10 captured samples
FakeNotify family, 10 captured samples
Jifake family, 10 captured samples
Mazarbot family, 9 captured samples
Nandrobox family, 11 captured samples
Plankton family, 10 captured samples
SMSsniffer family, 9 captured samples
Zsone family, 10 captured samples
In order to acquire a comprehensive view of our malware samples, we created a specific scenario for each malware category. We also defined three states of data capturing in order to overcome the stealthiness of an advanced malware:
1. Installation: The first state of data capturing which occurs immediately after installing malware (1-3 min).
2. Before restart: The second state of data capturing which occurs 15 min before rebooting phones.
3. After restart: The last state of data capturing which occurs 15 min after rebooting phones.
For feature Extraction and Selection, we captured network traffic features (.pcap files), and extracted more than 80 features by using CICFlowMeter-V3 during all three mentioned states (installation, before restart, and after restart).
You may redistribute, republish, and mirror the CIC-AndMal2017 dataset in any form. However, any use or redistribution of the data must include a citation to the CIC-AndMal2017 dataset and the following papers:
- Arash Habibi Lashkari, Andi Fitriah A. Kadir, Laya Taheri, and Ali A. Ghorbani, “Toward Developing a Systematic Approach to Generate Benchmark Android Malware Datasets and Classification”, In the proceedings of the 52nd IEEE International Carnahan Conference on Security Technology (ICCST), Montreal, Quebec, Canada, 2018.
You can download this dataset from here.