Own Stop Words Removal

Description

You need to provide a list of stop words, then it will be removed from your document.

The list of stop words must be placed in a text file, each word in a line. Stop word can also be regular expression (regex) pattern, but it must not contain space.

If you are not familiar with regex, find out more here.

Typical used regex pattern:

Type Example Regex pattern
URL http://www.doge.com, https://www.wow.com https?://.*
Hashtag #wow, #curcol, #love #.*
Mention @will.gozali @.*
Numbers 021, 56001, 2123123123 [0-9]+

Requirement

  • A plain text file containing list of stop words or regex pattern to be removed from your document. One word per line.

Example

Sample stop words content:

rp
koq
nih
http://.*
#.*

Sample input:

harga cabai Rp 15.000,00
harga cabai rp 15.000,00
bbm koq naik, warga sedih #edisicurhat
telah blokir website http://www.lucu.com

Sample output:

harga cabai Rp 15.000,00
harga cabai 15.000,00
bbm naik, warga sedih
telah blokir website