Regular expressions are used to find matches in texts. The following is a real application of Regex in C# and Java.
CSV are files that all the data is separated by a comma. E.g:
name,line1,line2,city,zip code,country
You cand easily use String.Split() in C# to get all the values. But, there are cases when the data can contain comma. E.g:
"Mr. John Doe, Jr.",7926 Glenbrook Dr., 14623
In this case a regular expression (regex) could be use to determine if the comma is inside a quote or not.
C# Example:
public string[] parseCSV(string line) { List<string> datalist = new List<string>();/* * Define a regular expression for csv. * This Pattern will match on either quoted text or text between commas, including * whitespace, and accounting for beginning and end of line. */ Regex rx = new Regex("\"([^\"]*)\"|(?<=,|^)([^,]*)(?:,|$)", RegexOptions.Compiled | RegexOptions.IgnoreCase); // Find matches. MatchCollection matches = rx.Matches(line); // Report the number of matches found. Console.WriteLine("{0} matches found.", matches.Count); // Report on each match. foreach (Match match in matches) { if (match.Groups[1].Value.Length > 0) datalist.Add(match.Groups[1].Value); // match csv values inside commas else datalist.Add(match.Groups[2].Value); // match csv values outside commas } return datalist.ToArray(); }</pre> </div> <div> </div> <div> </div> <div> Java Example:</div> <div> <pre> public String[] parse(String csvLine) { Pattern csvPattern = Pattern.compile("\"([^\"]*)\"|(?<=,|^)([^,]*)(?:,|$)"); matcher = csvPattern.matcher(csvLine); allMatches.clear(); String match; while (matcher.find()) { match = matcher.group(1); if (match!=null) { allMatches.add(match); } else { allMatches.add(matcher.group(2)); } } size = allMatches.size(); if (size > 0) { return allMatches.toArray(new String[size]); } else { return new String[0]; } } </pre> </div>
Now, your turn!
Thanks for reading this far. Here are some things you can do next:- Found a typo? Edit this post.
- Got questions? comment below.
- Was it useful? Show your support and share it.