This page is where I consolidate all my notes on the grep tool, which is available on Linux and Unix, and also on Windows as part of Cygwin. I usually use it on Cygwin, so these notes apply to that version of grep.
grep is used for searching files. It allows you to specify a pattern to look for (which can be a string or a regular expression) in the specified file(s) and by default will print all matching lines along with the file name. This output can then be parsed. The grep manual page gives the basic command:
grep [OPTIONS] PATTERN [FILE...]
The FILE can be a specific file, a wildcard expression or multiple files (e.g 1.txt, 2.txt). Pattern is the search string or regular expression. The options control the program's behaviour.
The grep command can be used recursively on all files in the current directory and all subdirectories:
grep -r PATTERN ./
This will match all the files in the current directory and, when combined with the -r, will run through all subdirectories.
To search recursively but only in specific file types (e.g., all .cpp files), you cannot use grep -r PATTERN *.cpp. The wildcard *.cpp is interpreted as all .cpp files in the current directory, so grep will only process .cpp files in the current directory.
The way to search recursively and limit to specified file types is to use the –include option of grep:
grep -r PATTERN --include=*.cpp ./
The –include option will instruct grep to only output results from files whose name matches the specified value, in this case *.cpp. The final ./ is still necessary to tell grep to recursively look in all the subdirectories. The value of –include can be other wildcards as well, such as data*.txt.
If you want to search multiple file types, you can modify the –include command to specify multiple extensions:
grep -r PATTERN --include=*.{h,cpp} ./
You can count the number of lines matching a certain pattern with the -c option in grep. To show all lines matching the pattern, followed by the count, use the following:
grep PATTERN file; grep -c PATTERN file
There are other ways for doing this with tools other than grep but I have not had to use them.
You can store the count to a Bash variable like this:
count=`grep -c PATTERN file`
I covered this in my text processing with shell utilities article, which also contains tips on how to process text with other shell utilities.
You can grep for two (or more) words at once. For example, create a file containing:
One Two Three Four Five
Then use the following grep command (the -E option enables extended regular expressions, which lets you use the | as an “or” operator):
grep -E 'One|Two|Three' file.txt
The output is:
One Two Three
This idea is covered in my text processing with shell utilities article, which contains other tips on how to process text with shell utilities.
Here are some simple examples of how to use grep.
You can pipe output from other utilities into grep, which is extremely useful if parsing the output of other programs. For example:
cat test.txt | grep [OPTIONS] PATTERN
You can also parse the output of grep using other grep commands:
cat test.txt | grep [OPTIONS] PATTERN1 | grep [OPTIONS] PATTERN2
This allows for filtering of text information based on matches to the specified patterns.
Grep can also be used to find lines in the file(s) that do not match the pattern by using the -v or --invert-match option:
grep -v PATTERN [FILE...] grep --invert-match PATTERN [FILE...]
grep is very flexible and can be used to find all files containing a match to a pattern:
grep -l PATTERN * grep --files-with-matches PATTERN *
It can also find all files that do not contain the pattern (note the option to skip directories, otherwise it will treat them as files and output the name of each directory):
grep --directories=skip -L PATTERN * grep --directories=skip --files-without-match PATTERN *
grep is also useful for finding lines in a single file. For example, the following prints all lines that contain the word Firefox in your web server logs, which will let you see all visits with the Firefox browser:
grep Firefox example.com-Apr-2011
Discussion