How to Find Duplicate Data in a Linux Text File With uniq
As a Linux user, you may come across situations where you need to find duplicate data in a text file. This could be a list of names, email addresses, or any other kind of information that you want to double-check for accuracy. Fortunately, Linux has a built-in tool called uniq that can help you quickly and easily identify any duplicated data in just a few simple steps.
Step 1: Open the terminal
The first step is to open the terminal on your Linux system. You can do this by pressing Ctrl+Alt+T on your keyboard or by searching for “terminal” in the application launcher.
Step 2: Navigate to the directory containing the text file
Once you have opened the terminal, navigate to the directory containing the text file you want to check. You can use the cd command to change directories. For example, if your file is located in the Documents folder, you can navigate to it by typing:
cd Documents
Step 3: Use the uniq command
Now that you are in the correct directory, you can use the uniq command to find duplicated data in your text file. The syntax for using the uniq command is as follows:
uniq [options] [input_file] [output_file]
You can customize the output of the command using various options. For example, you can use the -d option to show only the duplicate lines in the file. Here’s the command to enter to find all the lines that are duplicated:
uniq -d your_file.txt
If you want to save the output, you can use the > symbol followed by the name of a new file. For example, this command will save all the duplicated data to a file called “duplicates.txt”:
uniq -d your_file.txt > duplicates.txt
You can also use the uniq command to remove duplicate lines from the text file using the -u option. This will write the output to a new file with unique lines without duplicates. Here’s an example of the command to extract only the duplicated lines:
uniq -u your_file.txt > unique_lines.txt
Conclusion
Finding duplicate data in a Linux text file is a simple process with the uniq command. With just a few quick steps, you can easily identify and extract any duplicated data or unique lines from a file. This command is useful for anyone working with lists or other types of data in a Linux environment.